Generating podcast transcripts

Handy tool for those of you who, like me, are trying to work on listening skills. I find that even "easy" podcasts can feel completely incomprehensible because my listening skills are so low. Using a transcript is such a game changer! Listening to the podcast while reading the transcript and then re-listening without suddenly makes everything snap and if there are words I genuinely don't know, the transcript makes it easy to quickly mine words into Anki. Unfortunately, a lot of podcasts hide their transcripts behind paywalls.

To automatically generate transcripts, I have been using OpenAI's Whisper which is free and can be installed & run locally even on older hardware.

Details are at: https://github.com/openai/whisper You need to first install Python and then run a command to install Whisper on your computer. From there, download an mp3 of your favourite podcasts and generate the transcript by running (replace 'audio.mp3' with the file you want to transcribe):

whisper –model turbo –language Japanese -f txt audio.mp3

On my old laptop it takes about 1.5 minutes for every minute of the podcast but it just hums along in the background.

It's shocking to me how I can listen to something and catch individual words, listen again with a transcript and catch virtually everything, and then listen a third time without the transcript and while I miss a few things, mostly it all feels clear and easy. Huge help!

by NoobyNort

Generating podcast transcripts

Tags: