Open AI Open Sources Whisper, Multilingual Speech Recognition System • TechCrunch


Speech recognition remains a challenging problem in AI and machine learning. In a move to address the problem, OpenAI today open sourced Whisper, an automatic speech recognition system that the company says allows for “robust” transcription into text in multiple languages ​​and translation from those languages ​​into English.

Countless organizations have developed highly capable speech recognition systems that sit at the heart of software and services from tech giants like Google, Amazon, and Meta. But what makes Whisper unique is that, according to OpenAI, it was trained on 680,000 hours of multilingual and “multitasking” data collected from the web, resulting in improved recognition of unique accents, background noise and technical jargon.

“Primary intended users [the Whisper] Models AI researchers study the strengths, generalizability, capabilities, biases, and limitations of current models. However, Whisper can be very useful for developers, especially for English speech recognition, as an automatic speech recognition solution,” OpenAI writes on the GitHub repo Whisper, from which several versions of the system can be downloaded. “[The models] Show robust ASR results in ~10 languages. They may show additional capabilities… if they are well-tuned in certain tasks such as audio motion detection, loudspeaker classification or loudspeaker filtering, but are not strongly evaluated in these areas.

Whisper has its limitations, especially in the area of ​​text prediction. Because the system was trained on large amounts of “noisy” data, OpenAI warns that Whisper can include words that aren’t actually spoken into its transcripts — possibly because it’s trying to predict the next word by voice and record the audio itself. . Moreover, whispering does not perform equally well across languages, suffering from high error rates for speakers of languages ​​that are not well represented in the training data.

That last one is nothing new to the world of speech recognition, unfortunately. Bias has long plagued the best systems, with a 2020 Stanford study finding that search systems from Amazon, Apple, Google, IBM and Microsoft had far fewer errors — about 35% — than white users compared to black users.

Despite this, OpenAI sees Whisper’s transcription capabilities being used to enhance existing accessibility tools.

“While Whisper models cannot be used for real-time transcription out of the box, their speed and scale suggest that other applications that allow speech recognition and translation can soon be built on top of them,” the company continues on GitHub. “The real value of useful applications built on Whisper models shows that the differential performance of these models can have real economic implications… [W]We hope that the technology will be used primarily for useful purposes, making automatic speech recognition technology more accessible, allowing more actors to build efficient surveillance technologies or enhance existing surveillance efforts. Voice Communication”

The whisper release isn’t necessarily indicative of OpenAI’s future plans. While growing on commercial efforts like DALL-E 2 and GPT-3, the company is pursuing several theoretical research threads, including AI systems that learn by watching videos.



Source link

Related posts

Leave a Comment

3 × two =