Tech | Visa | Scholarship/School | Info Place

Assembly AI releases Universal-1 speech recognition model

Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.


Artificial intelligence-as-a-service provider Assembly AI has launched a new speech recognition model called Universal-1. Trained on more than 12.5 million hours of multilingual audio data, the company says it excels in speech-to-text accuracy in English, Spanish, French and German. It claims that Universal-1 can reduce hallucinations in speech data by 30% and in environmental noise by 90% compared to OpenAI’s Whisper Large-v3 model.

In a blog post, the company described Universal-1 as “another milestone in our mission to deliver accurate, faithful and powerful speech-to-text capabilities in multiple languages, helping our customers and developers around the world build speech AI application.” In addition to better understanding the four major languages, the model can code-switch and transcribe multiple languages ​​in a single audio file.

Assembly AI's chart shows how its Universal-1 speech recognition model compares to industry peers in generating correct words. Image source: Assembly AI
Assembly AI’s chart shows how its Universal-1 speech recognition model compares to industry peers in generating correct words. Image source: Assembly AI

Universal-1 also supports improved timestamp estimation, which is important when working with audio and video editing and dialogue analysis. Assembly AI claims the new model is 13% better than its predecessor, the Conformer-2. As a result, speaker classification is better, with a 14% improvement in concatenated permutation word error rate (cpWER) and a 71% improvement in speaker count estimation accuracy.

Finally, parallel inference becomes more efficient, reducing turnaround processing time for long audio files. Universal-1 is said to accomplish this task five times faster than Whisper Large-v3. Assembly AI compared the processing speed of Universal-1 to Whisper Large-3 on an Nvidia Tesla T4 machine with 16GB VRAM. With a batch size of 64, the former takes 21 seconds to transcribe 1 hour of audio. However, using a smaller batch size of 24, the latter takes 107 seconds to complete the same task.

VB event

Artificial Intelligence Impact Tour – Atlanta

Continuing our tour, we will head to Atlanta for the AI ​​Impact Tour stop on April 10th. This exclusive, invitation-only event in partnership with Microsoft will discuss how generative AI is transforming the security workforce. Space is limited, please request an invitation now.

request an invitation

The benefit of improved speech-to-text AI models is that note-takers can generate more accurate, hallucination-free notes, identify action items, and organize metadata such as proper nouns, who is speaking, and time information. Additionally, it will help creator tool applications integrate AI-driven video editing workflows, telemedicine platforms automate clinical record entry and claims submission processes (where accuracy is important), and more.

The Universal-1 model is available through Assembly AI’s API.

#Assembly #releases #Universal1 #speech #recognition #model

Leave a Reply

Your email address will not be published. Required fields are marked *