OpenAI Launches Whisper API With New Speech-To-Text Capabilities

For the moment, there may be a lull of the terrifying rhetoric some were able to massage out of the GPT-powered Bing Chat during its preview phase. However, OpenAI still has other interesting developments on the docket, including "Whisper," a machine-learning algorithm that aims to transform the effectiveness of speech-to-text for a multitude of users.

Speech-to-text is far from a new phenomenon. We've used it for years in the form of voice transcription apps and digital assistants by Google, Amazon, Apple, and Dragon Dictation, among others. Whisper's goal has been to strengthen the effectiveness of this technology by using a massive dataset by which an AI can much more deeply capture the nuance of everyday speech.

Voice transcription's inconsistency is most apparent when you're working outside the English language. Even within that bubble, however, the variance introduced by things like strong regional accents can make it tough for these systems to accurately transcribe your speech.

It sounded like an awesome development when OpenAI released the model back in September 2022, but due to the difficulty and costs associated with implementation, its adoption has been much slower than that of ChatGPT's. That could change with OpenAI's announcement of a publicly accessible API for Whisper, giving developers instant access to a language model that draws on more than 680,000 hours of speech data to offer more effective speech-to-text transcription.

Supercharged speech-to-text now available for third-party apps

With regard to English transcription, aside from its ability to accurately hear words from a much wider breadth of accents, it's also trained to filter out problematic background noise that can often throw these systems off. Whisper also aims to be better at transcribing unique technical jargon that competing systems might not yet recognize. Whisper API users can access both English-only and non-English transcriptions, as well as any-to-English translation (and vice versa).

The model was trained on 98 different languages, but only a subset of those are available in this API. Supported languages include: 

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

While today's news doesn't come with a ChatGPT-like component for the everyday user to enjoy, it does pave the way for existing apps to more easily tap into this technology, and pass its benefits onto their users. Language learning app Speak is among the first to leverage its capabilities. For others, applying for an API license is easy, and the costs don't sound too prohibitive — OpenAI offers a rate of just $0.006 per minute of on-demand usage.