When Voice Became a Workflow
On any given day, my INTP brain brims and overflows with *cool ideas* to streamline apps, tools, workflows, and User Experience across virtually every product or tool I touch.
I can't help it. It's an innate curiosity to understand how things work, what makes systems go, and how to make them go better.
For some, it's philosophical questions that keep them up at night (What is the meaning of life? Where do thoughts go?). For me, it's the idea that won’t let go—the one that whispers, "This could make that boring/repetitive process so much better."
Often these ideas stem naturally from the many side projects I dive into.
Photo by Polina Zimmerman |
But being an INTP means my thoughts fire off in a thousand directions, often showing up while walking my dog or driving.
I voice-record myself dictating story beats, scenes, or thoughts when inspiration strikes. That, combined with 40,000 words scattered across apps and platforms created a monster.
To make things work for me, I often voice record myself dictating parts of my book using Google Recorder when the spirit strikes, or in Notion, or even in the form of messages I send myself via Slack. What an editing nightmare!
The Problem: Voice Notes Without Structure
I had hundreds of memos—journal entries, creative bursts, and brainstorming sessions—sitting in disorganized folders. They were unnamed, untagged, and completely unsearchable. Manually transcribing and tagging them was both tedious and error-prone. I needed something smarter. And repeatable.
And Then I Had an Idea...
What if I could:
Auto-transcribe every voice memo I create
Clean file names and log every transcript
Summarize each one using GPT and generate tags
Automatically organize all of it in Notion by chapter, scene, or theme
BOOM.
A personal creative Ops pipeline. Leveraging Bash, AI, Whisper CLI, Notion, REST APIs, and Python. Here's how I built it—and how it became a proving ground for my growth in DevOps, automation, and product strategy.
Gotta Have a Dream: What I Want This Pipeline to Do
First Stage
Trigger auto-transcription for all
.m4a
files in a given directorySanitize filenames for safe command line execution
Log transcript creation for audit tracking
Generate GPT summaries and keyword tags
Second Stage
Connect transcript files to Notion via REST API
Parse and incorporate typewritten material from Notion and local drafts
Auto-summarize chapters or sections with GPT
Tag and organize output for fast indexing
Third Stage
Organize and index in Notion by:
Index ID
Tags / Keywords
Chapter / Scene reference
Word counts
Pull all three stages into one cohesive pipeline using Zapier triggers
As a Certified Scrum Master and Product Owner, I thrive on solving complex problems and turning fragmented chaos into clean, scalable systems. This project gave me free rein to dive deep into scripting, automation, and systems orchestration.
It also gave me a perfect reason to level up my skills:
Bash & Linux
Python scripting
GPT / prompt engineering
REST API development
Zapier integration
Notion API workflows
I realized how much I love building and managing automated solutions that scale. This wasn’t just a side project—it became a lab for testing real-world DevOps, AI workflows, and knowledge operations.
The Stack I Used
🎧 Whisper CLI
I chose OpenAI's Whisper CLI to transcribe audio files (.m4a, .mp3, .wav). It’s impressively accurate and customizable via model size (I use the "medium" model).
✨ Bash + Cron Jobs
The backbone of the pipeline is a series of Bash scripts:
transcribe.sh
— for single filebatch-whisper-transcribe.sh
— batch processingclean-filenames.sh
— sanitizes inputwhisper_transcript_log.sh
— logs transcript creation events
🚀 GPT API
Summarizes the transcript and creates tags for later indexing.
📂 Notion API
Pushes structured content into my Notion workspace.
The Pipeline (Simplified)
Audio File (.m4a)
↓
Bash Script (transcribe)
↓
Whisper CLI
↓
Transcript JSON
↓
Python script
↓
GPT Summary
↓
Formatted payload → Notion API
Key Lessons Learned
Automation = peace of mind. I don’t waste time manually naming, tagging, or processing files anymore.
Bash is a force multiplier. It unlocked new workflows and deepened my understanding of how Linux systems operate.
APIs are the glue. REST APIs made it possible to bind all these tools together.
Being "technical" is mindset first. Building this pipeline helped me bridge strategy and engineering—and reminded me how much I enjoy both.
Automation = peace of mind. I don’t waste time manually naming, tagging, or processing files anymore.
Bash is a force multiplier. It unlocked new workflows and deepened my understanding of how Linux systems operate.
APIs are the glue. REST APIs made it possible to bind all these tools together.
Being "technical" is mindset first. Building this pipeline helped me bridge strategy and engineering—and reminded me how much I enjoy both.
What’s Next (v2 Goals)
Add diarization (speaker labels)
Expand to multi-file summarization
Build a dashboard to monitor performance + errors
Release as a self-hosted toolkit with Zapier triggers
Add diarization (speaker labels)
Expand to multi-file summarization
Build a dashboard to monitor performance + errors
Release as a self-hosted toolkit with Zapier triggers
See the Code
Final Thoughts
This isn’t just about Bash or GPT. It’s about taking ownership of the systems that support your creativity and operations. If you're curious about automation, DevOps for writers, or want to collaborate—let’s connect.
This is just the beginning.
No comments:
Post a Comment