Wednesday, June 4, 2025

From Voice to Value: Building My Whisper + Notion AI Pipeline

When Voice Became a Workflow

On any given day, my INTP brain brims and overflows with *cool ideas* to streamline apps, tools, workflows, and User Experience across virtually every product or tool I touch.

I can't help it. It's an innate curiosity to understand how things work, what makes systems go, and how to make them go better.

For some, it's philosophical questions that keep them up at night (What is the meaning of life? Where do thoughts go?). For me, it's the idea that won’t let go—the one that whispers, "This could make that boring/repetitive process so much better."

Often these ideas stem naturally from the many side projects I dive into.

Photo by Polina Zimmerman
One of those ideas sparked while writing my book—a passion project I’ve been developing over the last year using the notebook system popularized by Mary Adkins and The Book Incubator. Like many great authors—Alice Walker, Toni Morrison, JK Rowling, Stephen King—I draft by hand.

But being an INTP means my thoughts fire off in a thousand directions, often showing up while walking my dog or driving.


I voice-record myself dictating story beats, scenes, or thoughts when inspiration strikes. That, combined with 40,000 words scattered across apps and platforms created a monster.

To make things work for me, I often voice record myself dictating parts of my book using Google Recorder when the spirit strikes, or in Notion, or even in the form of messages I send myself via Slack. What an editing nightmare!

The Problem: Voice Notes Without Structure

I had hundreds of memos—journal entries, creative bursts, and brainstorming sessions—sitting in disorganized folders. They were unnamed, untagged, and completely unsearchable. Manually transcribing and tagging them was both tedious and error-prone. I needed something smarter. And repeatable.

And Then I Had an Idea...

What if I could:

  • Auto-transcribe every voice memo I create

  • Clean file names and log every transcript

  • Summarize each one using GPT and generate tags

  • Automatically organize all of it in Notion by chapter, scene, or theme

BOOM.


A personal creative Ops pipeline. Leveraging Bash, AI, Whisper CLI, Notion, REST APIs, and Python. Here's how I built it—and how it became a proving ground for my growth in DevOps, automation, and product strategy.


Gotta Have a Dream: What I Want This Pipeline to Do


First Stage

  • Trigger auto-transcription for all .m4a files in a given directory

  • Sanitize filenames for safe command line execution

  • Log transcript creation for audit tracking

  • Generate GPT summaries and keyword tags

Second Stage

  • Connect transcript files to Notion via REST API

  • Parse and incorporate typewritten material from Notion and local drafts

  • Auto-summarize chapters or sections with GPT

  • Tag and organize output for fast indexing

Third Stage

  • Organize and index in Notion by:

    • Index ID

    • Tags / Keywords

    • Chapter / Scene reference

    • Word counts

  • Pull all three stages into one cohesive pipeline using Zapier triggers

Illustration of Automation Pipeline using Notion, Whisper and API


From Idea to Reality

As a Certified Scrum Master and Product Owner, I thrive on solving complex problems and turning fragmented chaos into clean, scalable systems. This project gave me free rein to dive deep into scripting, automation, and systems orchestration.

It also gave me a perfect reason to level up my skills:

  • Bash & Linux

  • Python scripting

  • GPT / prompt engineering

  • REST API development

  • Zapier integration

  • Notion API workflows

I realized how much I love building and managing automated solutions that scale. This wasn’t just a side project—it became a lab for testing real-world DevOps, AI workflows, and knowledge operations.


The Stack I Used

🎧 Whisper CLI

I chose OpenAI's Whisper CLI to transcribe audio files (.m4a, .mp3, .wav). It’s impressively accurate and customizable via model size (I use the "medium" model).

✨ Bash + Cron Jobs

The backbone of the pipeline is a series of Bash scripts:

  • transcribe.sh — for single file

  • batch-whisper-transcribe.sh — batch processing

  • clean-filenames.sh — sanitizes input

  • whisper_transcript_log.sh — logs transcript creation events

🚀 GPT API

Summarizes the transcript and creates tags for later indexing.

📂 Notion API

Pushes structured content into my Notion workspace.


The Pipeline (Simplified)

Audio File (.m4a)
    ↓
Bash Script (transcribe)
    ↓
Whisper CLI
    ↓
Transcript JSON
    ↓
Python script
    ↓
GPT Summary
    ↓
Formatted payload → Notion API

Key Lessons Learned

  • Automation = peace of mind. I don’t waste time manually naming, tagging, or processing files anymore.

  • Bash is a force multiplier. It unlocked new workflows and deepened my understanding of how Linux systems operate.

  • APIs are the glue. REST APIs made it possible to bind all these tools together.

  • Being "technical" is mindset first. Building this pipeline helped me bridge strategy and engineering—and reminded me how much I enjoy both.


What’s Next (v2 Goals)

  • Add diarization (speaker labels)

  • Expand to multi-file summarization

  • Build a dashboard to monitor performance + errors

  • Release as a self-hosted toolkit with Zapier triggers


See the Code

GitHub Repo →


Final Thoughts

This isn’t just about Bash or GPT. It’s about taking ownership of the systems that support your creativity and operations. If you're curious about automation, DevOps for writers, or want to collaborate—let’s connect.

This is just the beginning.

No comments:

Post a Comment