As journalists, we spend a lot of time transcribing audio recordings into text that is then used for articles. We’re not the only ones with this problem though – academics and researchers, students, and even people who attend a lot of meetings and need to keep everything organised would have ended up with a long transcription queue at some point of time or the other.
Our normal workflow to deal with this has been to keep the audio file playing in QuickTime in the background, as we type in a text editor. There are a couple of obvious problems with this – for one, things like pausing and moving back and forward are needlessly complicated as you move between programs, and for another, controlling playback speed to suit your typing speed isn’t easy either. In short, it’s a really bad workflow.
As a result, we’re always on the lookout for a good app that can solve this problem because it would make life a lot easier – in one instance where the volume of work was too high, we actually resorted to getting someone from Freelancer.com to help transcribe a book’s worth of research notes, but that’s not a great solution if you are on a limited budget. We decided to ask people what they’re using, and check on tech sites and forums like Product Hunt and Reddit, to find out what the best options are. We came across a lot of recommendations, and then using some of our interview recordings, took them all for trial runs to see what could be a long term solution.
From there, we’ve narrowed things down to just a few options that we thought were the best, and the includes some very different types of solutions. There are basically three ways to end up with a transcript. You can either do it manually, using different tools that make the process more efficient. Or you can try to get a computer generated transcript, which is going to be full of errors, but will at least get you started, and thus reduce the amount of time you spend on a project. Or you could pay someone to turn the transcript around for you – like we did with Freelancer.com. We focussed on the first two methods, and here are our top picks.
Sonix is a Web-based transcription tool that worked reasonably well for us. We tried the service with four different audio clips on the service and the results were pretty good. Sonix supports multiple languages but English aside, it’s unlikely that any of those are going to be useful in India. Sonix supports American, British, and Australian accents for English, and has an option for all other English accents.
We uploaded four audio clips to the website to test Sonix. The first was an interview with Amazon’s Tom Taylor, who has an American accent. This clip had the best transcription success rate, with just proper nouns such as Echo being misspelled. It was a 30-minute interview that was transcribed in less than 10 minutes and was quite good overall.
The second clip was an interview with an Indian startup founder in a noisy environment and the results were quite poor. To be fair, Sonix does mention that it needs audio free of much background noise, but even then, the results were very poor. The third clip was a clear recording of an Indian woman speaking about an infrastructure problem. This byte was transcribed reasonably well, barring some words that were incorrect.
The final clip was a call recording between two people speaking in English with thick Indian accents. There wasn’t much background noise here and initially Sonix messed up the transcription completely. We alerted the company about this issue and they responded with an updated transcription that was almost as accurate as the third clip. Sonix says this was due to multiple transcription systems that they have and they used a different model for this clip when we alerted them about the issue.
In our testing Sonix turned out to be quite good with high-quality audio files where the speaker is speaking at a moderate pace. When the speakers have thick Indian accents and are speaking fast, Sonix’s results weren’t that great. However, the service has multiple features that make it worth checking out.
We loved the fact that it has a built-in text editor that lets you quickly edit the transcript while listening to the clip. The speed of transcription is also very fast and on par with other services. If you pay for the service it can distinguish between two different speakers and mark them as well. The best feature, however, is a confidence marker where it shows how many words it’s confident that it has transcribed correctly. It colour grades words to show how accurate it thinks they are, a feature that worked well in our tests.
Sonix offers all of these features and more for $6 (around Rs. 450) per hour of transcribed audio files apart from a $15 (around Rs. 1,100) per month subscription fee. The annual plan reduces the price to $10 (around Rs. 740) per month. The pricing isn’t the cheapest in the market but the results with high-quality recordings are good enough to consider this service. There’s a 30-minute free trial that you should use and see the results for yourself.
Transcribe by Wreally
The top recommendation across various platforms, Transcribe is an option we also liked for its simplicity and effectiveness. Transcribe is basically an audio player with a notes tool built in, that lets you listen to the recording and make your notes in the same place. You can use keyboard shortcuts for a number of important playback related features, and the combination is a serious step up from using a text editor with QuickTime in the background.
The tool runs on your computer in a browser window, but it also works offline. You can upload the audio, and save the text locally, without any issues. The audio file plays with controls on the top of the page, and there’s a text box below where you can enter the text, complete with formatting, and then export it as a .DOC file, if needed. Shortcuts using the function keys let you pause and play, speed up or slow down the audio, add a timestamp to the text, and so on. If you’re a Mac user, you’ll want to go to settings and have the keys work as function keys rather than controlling things like your brightness and volume, but otherwise it’s the same.
This is obviously a better solution to our normal transcription workflow, and using Transcribe by Wreally, we were able to convert a 30 minute recording into usable text in just over 45 minutes, something that used to take us an hour or a little bit longer.
There’s also an interesting workaround if you want to transcribe without typing; although Transcribe doesn’t let you upload audio files, you can dictate the words and it’ll automatically type them up, if you’re using Chrome. It only works on Chrome, and so it’s possibly using Google’s speech to text APIs – whatever the engine, the results are fairly accurate, although it’s not the best solution. For one thing, you can get the occasional substitution when “find” becomes “third”, and “numerous” becomes “pneumatic”. For another, it’s just not a great experience to keep repeating everything you’re hearing – either you can listen to the recording, or say the words, and so it’s hard to keep track, and required a lot of pausing and moving back and forth. We also had an issue where the cursor wouldn’t consistently move forwards. Despite these drawbacks, once you have used the dictation function for a while, you get used to its quirks, and it is fast and reliable enough.
Transcribe isn’t free though – the free trial lasts for a week, and after that you have to pay a $20 annual license. That’s a pretty good deal if you use it a lot, though it may feel a little expensive if you aren’t using it often.
You can try Transcribe out for yourself for a week and see if it’s a good fit. If you’re looking for a free alternative, check out oTranscribe. It’s a great option with almost all the same features, but it lacks the dictation mode, so you’ll have to type the whole text.
Trint is a pretty straightforward service that automatically transcribes the audio files you upload, and sends you a transcript. Trint lets you upload a file and then transcribes it on the Internet – when it’s done (which depends on the length of the audio file), you’ll get an email updating you, so you can close the window and do other work in the meanwhile. It didn’t take much time though – a 10 minute file took just about four minutes to digest.
However, Trint doesn’t just provide a text file. Instead, after transcribing, it provides a powerful text editor that allows you to listen to the playback while editing the text, just like Transcribe. You can even tag different sections of text by speaker, or add highlights. You can also add strikethrough to text, which tells Scribie to skip those parts when playing the audio. When you’re done, you can export the text, which could be as a .DOC file, or a .SRT subtitle file, or if you only need parts of the file, you could choose to export only the highlights.
You can change the playback speed, show a timestamp for every paragraph, or navigate the text by moving back and forth through the audio file. As the audio plays, the related text is highlighted as well, so it’s very easy to keep track. It’s pretty great, though one limitation is that you can only use it on your computer – there are no iOS or Android apps.
The accuracy of the transcription also leaves something to be desired. “Go on and on” somehow turned into “they don’t”, while “obnoxious, arrogant” became “block every”. Our favourite though was “are the envy of” becoming “zombie yo”. By and large though, the text is pretty clean, with around 70 percent of it being correct; and it can speed up the transcription a lot to have this as a starting point.
You’ll be charged at $15 per hour of audio, which isn’t a bad rate, particularly since the recording and the transcript (with all the edits that you make) are always available whenever you need them. You can try Trint for 30 minutes free and see how well it suits your needs. If you’re not interested in paying, you can also use Scribie, which offers unlimited free machine transcription.
Scribie is a little less accurate, and does best with very clear audio and an American accent. In our experience with the same interview text, it was probably around 60 percent accurate to Trint’s 70, although interestingly, the two made different mistakes. Some of the best slip ups were “students” becoming “Shodan”, and “Ivy League” turning into “idli”. The company says it takes up to 30 minutes to transcribe, though our 20 minute clip took between four and five minutes.
Scribie also has a human-processed transcript, for which it charges $0.60 (roughly Rs. 40) per minute, which a maximum of five-days for the turnaround. A rush-job has a 12-hour turnaround time, and is priced at $2.40 (just over Rs. 150) per minute.
If you liked the idea of Trint but thought that the interface left something to be desired, and didn’t like the idea of running an app in your browser, give Descript a shot instead. The app is free, and comes with 30 minutes of free transcription, after which you’ll pay $0.15 (roughly Rs. 10) per minute, which is pretty reasonable.
Descript has a great looking Mac app that lets you do all the things that Trint does, starting with an automatic transcription, and then letting you edit the text. You can mark text to skip the audio playback, correcting errors and creating a smooth script that matches the audio perfectly. It’s really great and has all the features you need in an interface that we loved.
As you move through the text, it shows your place in the audio file as well, and allows you to publish the edited audio and text to the Web if you like. It’s powered by Google Speech, and it’s quite accurate, although there are obviously still some errors. We found it be close to 80 percent accurate, as long as the audio was clear, without overlap, and ideally with American accents.
Descript also offers a monthly subscription plan, where you pay $20 per month up front, but then your per minute fee is $0.07 per minute, which sounds like a good option for heavy users.
You can download Descript free, and try it out for a 30 minute file to get a sense of how it works, before either paying or signing up for a subscription. A Windows version is coming in January 2018. There is no mobile version for Descript either.
In our experience, Descript was probably the best tool of the bunch, though its per minute pricing isn’t fully convenient. As of now, we’re inclined towards Transcribe by Wreally, since it offers an annual subscription with no additional cost, and the dictation mode is a step up from oTranscribe. There were also a number of mobile apps which promised similar experiences, but in our testing were limited. Transcribing that involves a fair amount of typing on a touchscreen still leaves something to be desired, and it’s best to stick with these PC-based options instead.
What about you, which one do you think suits you best? Tell us, and the other readers, via the comments below.
Catch the latest from the Consumer Electronics Show on Gadgets 360, at our CES 2022 hub.