Transcribing podcast episodes


#1

I thought I’d explore transcription options for podcasts.

If uploaded to YouTube, the default answer appears to be using the automatic subtitle generation. But for the more traditional podcast formats, there isn’t an obvious solution. I tried running a short clip from an upcoming podcast episode through three different transcription services. The text portion of the results is shown below.

Google: Cloud Speech-to-Tet API

“I’m here with ready camera I’ll be a software engineer altran previously a successful marketer and copywriter radhika house without formal education made the transition to software engineer welcome with I’d like to explore in a bit your your journey to software engineer how you got started with tech how you picked up the skills that landed you that first engineering job but before we get there can we go back to that marketing business that sounded successful little bit about that and the role that played in”

Additionally, the Cloud Speech-to-Text API provides an array of alternative transcriptions with a confidence score for each. This array was only populated with one result during my tests.

Amazon: Transcribe

“I’m here with riddick, um, arabia. So for engineer eltron, previously a successful marketer and copywriter, radhika has, without formal education, made the transition to software engineer. Welcome, radhika. I’d like to explore all of it. Your your journey to software engineer, how you got started with tech, how you picked up the skills that landed you that first engineering job. But before we get there, can we go back to that marketing business that sounded successful until just a little bit about that on the roll, the tech played in it.”

The Transcribe API is definitely the more fiddly to use, but it returns a little more detail along with the transcription, including timing and an array of alternatives. As with Google’s API these only ever contained single values in my testing.

{
    "start_time": "9.93",
    "end_time": "10.31",
    "alternatives": [
        {
            "confidence": "0.9931",
            "content": "software"
        }
    ],
    "type": "pronunciation"
},

YouTube

“I’m here with Radhika Merapi a software engineer l-tron previously a successful marketer and copywriter Radhika has without formal education made the transition to software engineer welcome Radhika I’d like to explore it a bit your your journey to software engineer how you got started with tech how you’ve picked up the skills that landed you that first engineering job but before we get there can we go back to that marketing business that sounded successful teach us a little bit about that and the role that Tech played in it”

There is no API for the YouTube transcription service. You upload your video and wait. The transcription is available for download in a number of formats for subtitling video.

Code for using the Google Text-to-Speech API is available here: github.com/billglover/podtotxt


#2

The AWS documentation includes this snippet which should make the service attractive for podcasters:

“Amazon Transcribe can identify the individual speakers in an audio clip, a technique known as diarization or speaker identification. When you activate speaker identification, Amazon Transcribe includes an attribute that identifies each speaker in the audio clip.”

What is Amazon Transcribe

I might be tempted to try and wrap the amazon service in something that is a little easier to use than the existing Amazon API.