← All projects

Project

Eldred Rock Scribe

Active ReactTypeScriptFlaskWhisperllama.cpp
Eldred Scribe transcription pipeline

I built Eldred Rock Scribe to turn hour-long investment podcasts into research-ready notes in minutes. The problem is familiar to anyone doing fundamental work: valuable ideas are buried in audio, and manually transcribing and summarizing does not scale. This project automates that workflow end-to-end while keeping inference local and costs predictable.

Architecture

The system is a React + TypeScript frontend talking to a Python Flask API over REST. A user submits a YouTube URL; the backend downloads audio with yt-dlp and FFmpeg, transcribes it with OpenAI Whisper (tiny.en), and summarizes the transcript with a locally hosted Mistral 7B Instruct model via llama.cpp (GGUF, Q4_K_M quantization). Results return as structured JSON and render in the browser — including an interactive mind map built with React Flow.

Technical highlights

  • Long-audio handling: Audio is split into ~5-minute chunks with pydub, transcribed in parallel with concurrent.futures, then stitched into a full transcript.
  • Token-aware LLM summarization: Transcripts are chunked with tiktoken so each pass stays within context limits; outputs are cleaned and merged into a coherent briefing.
  • Investment-oriented prompts: Summaries are tuned for buy-side workflows — business model, moat, capital allocation, risks, and key quotes — not generic abstractive summaries.
  • NLP enrichment: spaCy (en_core_web_sm) supports entity extraction; LexRank (sumy) provides extractive baselines alongside generative summaries.
  • Knowledge-base export: Optional one-click export writes Markdown notes to Obsidian with metadata (runtime, video length, processing efficiency).
  • Companion tooling: A separate Streamlit app handles Substack articles, PDFs, and YouTube captions for non-podcast sources; email parsers extract links from RTF/.eml newsletters for batch ingestion experiments.

Why it matters

The stack is deliberately pragmatic: Whisper for robust speech-to-text, a quantized local LLM for privacy and repeatability, and a thin API boundary so the UI can evolve independently. It demonstrates full-stack delivery — from media ingestion and parallel compute to prompt engineering and interactive visualization — applied to a real research workflow, not a toy demo.

Stack: React 19, TypeScript, React Flow, Flask, Whisper, llama.cpp, yt-dlp, FFmpeg, pydub, spaCy, NLTK, BeautifulSoup, Streamlit

Get in touch

Contact

Feel free to reach out with questions, comments, or collaboration opportunities. Reach me at im@christopherpinkston.com.