July 28

Two helpful tools for creating captions

Within higher education, there is a lot of pressure to ensure that our course content is accessible.  With a great demand to incorporate video (instructor feedback, announcements, module overviews, etc.), an issue we need to address is how to properly caption content to adhere to legislative guidelines and accessibility best practices.  (see: https://www.3playmedia.com/resources/accessibility-laws/)

Captioning video is an incredibly complicated and error-prone endeavor.  At the last Online Learning Consortium conference I attended, I sat in many sessions regarding ADA, and talked with vendors that specialize in captioning.  There are the notes I took from those sessions:

  • It takes 2 licensed professionals to get 99% accuracy for captions

  • Humans typically are 95% accurate

  • Google is 90-91% accurate

  • Watson is 92% accurate

  • Microsoft Azure is 85-90% accurate

Interestingly, 50% of all online content is video – over half of that video is accessed over mobile devices.  In education, we are seeing a lot more interactive video, such as on quizzes, professor comments, and video replies.  The issue is that to get accurate transcriptions, it may either cost a lot of money or take a lot of time (or, likely, both).

Captioning: You can choose 0-2 of the options above

You can choose 0-2 of the options above

One of our current vendors, Panopto, has a captioning service available to use.  Depending on the turnaround time, we can purchase and enable a caption service for $2-5/minute.  This would be licensed, 508 compliant human transcriptions (see: https://support.panopto.com/s/article/caption-services).  But when scaled, that can represent a significant amount of budget.

So what are the more cost-effective, “DIY” options available to us?  There are many possibilities, but I will mention two options.  These are sort of the low-hanging fruit for professors and departments that are trying to take the first steps.

Google Speech Recognition

Google created an API that allows you to use your computer microphone to create transcriptions.  This works only in the google chrome internet browser, and can be accessed here: https://www.google.com/intl/en/chrome/demos/speech.html.


To use this tool, simply press the microphone button on the page.  The browser can be minimized (but not closed), and can run in the background.  You can start the speech recognition recording, then record your video/presentation, and when you can finished you can come back to the google speech recognition page and stop the recording.  As you speak, the transcription appears in real time.  You may need to go back and edit the caption (remember, Google accuracy is typically 90-91%).  This is a simple and effective way to get a basic transcription, and once you are finished recording you simply copy/paste the text.  I have even taken an audio recording, put the microphone next to the computer speaker, and recorded a transcription that way.  Give the Google API a try.


I can’t imagine that there is a person reading this blog who isn’t already a consumer of YouTube.  However, fewer people are likely producers of content.  If you already have a gmail or google account, then you have the ability to upload videos to YouTube.  If you don’t have an account, then it’s free.  YouTube has made great progress in transcribing all videos, and for the most part they are fairly accurate – though it depends on the clarity of speech, the quality of the audio capture, ambient noise, speech accents, etc.

The great thing about using YouTube is that you can either live record or upload a video and the captions can be rendered automatically.  For info on setting up automated captioning, see: https://support.google.com/youtube/answer/6373554?hl=en.  The big news as of recent is that YouTube is now going to be captioning live broadcasts (e.g. google hangouts or YouTube Live).  Users can either upload a transcript and read for the script, or use the platform’s auto-generating caption technology.  Right now live captioning is still in beta to YouTube power-users, but they are planning a broad rollout in the coming months.

An important thing to remember when considering between the Google speech recognition API, YouTube captioning, or similar other platforms, is that these are tools to get you started with transcriptions.  You will have an opportunity to go through the transcription and tidy it up – and in fact you definitely should review it at least once to clean through it.  Even if the accuracy it 90%, that means that there are 10% errors that need your attention.  But then again, that’s 90% work that you don’t need to do, and if anyone has ever done manual captioning before, you know that it is hard, boring, and incredibly time-consuming.  So you might as well let the technology automation kick start the process for you.

Written by Dr. Sean Nufer, Director of Ed Tech for TCS Education System.

Originally published on innovedtech.com

Posted July 28, 2020 by drnufer in category Uncategorized

Leave a Comment

Your email address will not be published. Required fields are marked *