Tech | Visa | Scholarship/School | Info Place

OpenAI launches voice cloning artificial intelligence model Voice Engine

Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.


ChatGPT maker OpenAI isn’t content with just disrupting text generation, images, and video with its various AI models, it’s also getting into the last major form of traditional digital media: audio. Specifically, voice cloning.

The company today announced the launch of its latest artificial intelligence model, “Speech Engine,” which it says has been in development since 2022 and currently powers OpenAI’s text-to-speech API and the new ChatGPT speech and read-aloud features launched earlier this month provide support.

It turns out that the model can also perform voice cloning. Here’s how it works: A human speaker records a 15-second segment of speech through a phone or computer microphone, and OpenAI’s speech engine generates “natural speech that closely resembles the original speaker” and can later be used to speak it out loud Any text entered by a human user.

Huge impact on the speech audio market

This technology obviously has huge implications for those who regularly record themselves speaking, whether they are podcasters, voiceover artists, spoken word performers, audiobook and advertising narrators, gamers, streamers, customer service agents, salespeople, and Many other careers and disciplines.

VB event

Artificial Intelligence Impact Tour – Atlanta

Continuing our tour, we will head to Atlanta for the AI ​​Impact Tour stop on April 10th. This exclusive, invitation-only event in partnership with Microsoft will discuss how generative AI is transforming the security workforce. Space is limited, please request an invitation now.

request an invitation

This has also put pressure on other companies working on such technologies, such as well-funded artificial intelligence startup ElevenLabs, Captions, Meta, WellSaid Labs, MyShell and others.

OpenAI further highlights the speech engine’s ability to support non-verbal individuals, giving them a unique, non-robotic voice, and assisting with therapy and education programs for those with speech impairments or learning needs.

Initial use case

In a blog post announcing Voice Engine today, OpenAI said that so far it has only made the technology available to “a small group of trusted partners.”Those highlighted and named include

  1. learning erais an education technology company that uses speech engines and GPT-4 to generate pre-written, real-time personalized speech content to extend reading assistance and interactivity for diverse student audiences.
  2. Hagenis an artificial intelligence visual storytelling platform that enables creators and businesses to translate their content into multiple languages, employ a speech engine for video translation, create custom humanoid avatars with multilingual voices, retain the original speaker’s accent, and Reach a global audience.
  3. Dimaghiis a software company that makes tools for community health workers, using a speech engine and GPT-4 to provide said workers with interactive feedback in a variety of languages, improving the delivery of essential services in remote environments.
  4. Levoxis an artificial intelligence application for augmentative and alternative communication (AAC) devices used by people with speech and hearing difficulties that integrates a speech engine to provide non-verbal individuals with a unique, non-robotic voice across languages.
  5. Norman Prince Neuroscience Institute Life CycleA nonprofit medical and teaching organization at Brown University dedicated to helping those with neurological diseases and disorders is using a speech engine to help those with speech disabilities use artificial intelligence versions of their voices. Two doctors there, Rohaid Ali and pediatric neurosurgeon Konstantina Svokos, have successfully restored the health of a brain tumor patient using audio samples from a video of her school project. speech.

The company uploaded several audio samples on its blog and emailed to VentureBeat during the embargo that demonstrate the technology’s human-like speaking capabilities. For example, here are the original “source sounds” of Lifespan patients:

This is a cloned speech using the OpenAI speech engine:

Limited user base by design

But for now, the technology is limited. Like its powerful, extremely realistic and vivid video-generating AI model Sora, OpenAI no Public use of the speech engine is currently allowed. Instead, today OpenAI is simply sharing the tool’s existence and “initial insights and results from a small-scale preview” with a “small group of trusted partners” who have been given access.

As OpenAI said in a blog post announcing the technology today:

“Due to the potential for misuse of synthetic voices, we are taking a cautious and informed approach to a wider release. We hope to start a conversation about the responsible deployment of synthetic voices and how society can adapt to these new features. Based on these conversations and small-scale testing As a result, we will make more informed decisions about whether and how to deploy this technology at scale.”

A cautious, slow, steady, limited-access approach to releasing speech engines makes sense, especially given U.S. President Joseph R. Biden’s recent call for a “ban on AI voice imitation.”

At the core of OpenAI’s deployment strategy is strict compliance with safety and ethical guidelines. Partners participating in testing the speech engine are subject to usage policies that prohibit unauthorized imitation and require the informed consent of the speech donor.

In addition, OpenAI has implemented security measures such as watermarking and active monitoring to ensure the responsible use of the technology.

#OpenAI #launches #voice #cloning #artificial #intelligence #model #Voice #Engine

Leave a Reply

Your email address will not be published. Required fields are marked *

Index