Tech

Breaking down language partitions: ElevenLabs launches multilingual text-to-speech for various audiences

August 22, 2023

776

[ad_1]

Head over to our on-demand library to view classes from VB Transform 2023. Register Here

ElevenLabs, a year-old startup that’s leveraging the ability of machine studying for voice cloning and synthesis, right this moment introduced the growth of its platform with a brand new text-to-speech mannequin that helps 30 languages.

The growth marks the platform’s official exit from the beta part, making it prepared to make use of for enterprises and people trying to customise their content material for audiences worldwide. It comes greater than a month after ElevenLabs’ $19 million collection A spherical that valued the corporate at almost $100M.

“ElevenLabs was started with the dream of making all content universally accessible in any language and in any voice. With the release of Eleven Multilingual v2, we are one step closer to making this dream a reality and making human-quality AI voices available in every dialect,” Mati Staniszewski, CEO and cofounder of the corporate, stated in a press release.

“Eventually we hope to cover even more languages and voices with the help of AI and eliminate the linguistic barriers to content,” he added.

Event

VB Transform 2023 On-Demand

Did you miss a session from VB Transform 2023? Register to entry the on-demand library for all of our featured classes.

Eleven Multilingual v2: How is it helpful?

ElevenLabs affords two important voice-focused AI merchandise – Speech Synthesis and VoiceLab.

The former is a synthesis device that generates natural-sounding speech from textual content inputs. The latter is an add-on of kinds that offers customers the flexibility to clone their very own voices or generate totally new artificial voices (by randomly sampling vocal parameters) to be used with the synthesis device.

Once a person creates their customized voice, they’ll plug it into the text-to-speech device to transform any quick or long-form content material of their alternative into their most well-liked speech – with no effort in any respect. As another, they might additionally use a bunch of premade AI voices from the corporate or these created and shared publicly by the group.

In the early days, the synthesis device began off with a mannequin that produced speech simply in English. Later, it was expanded to Eleven Multilingual model 1, which used textual content inputs and AI voices to generate speech in six languages: English, Polish, German, Spanish, French, Italian, Portuguese and Hindi.

Now, with the discharge of the Eleven Multilingual model 2, the providing can now synthesize speech in 30 extra languages. This contains Korean, Dutch, Turkish, Swedish, Indonesian, Vietnamese, Filipino, Ukrainian, Greek, Czech, Finish, Romanian, Danish, Bulgarian, Malay, Hungarian, Norwegian, Slovak, Croatian, Classic Arabic and Tamil.

The transfer primarily means an individual might clone their voice and use it to provide speech in dozens of languages concentrating on totally different markets.

According to ElevenLabs, the person has to enter the textual content within the language of their alternative, choose the voice they need (pre-made, artificial or cloned) and alter a number of speech parameters. The mannequin will mechanically establish the written language and use the set parameters to generate speech in it. It additionally maintains the chosen voice’s distinctive traits throughout all languages, together with its authentic accent.

“Our model is able to understand the relations between words and adjust delivery based on context (‘contextual’ text-to-speech). Because there are no hardcoded voice features in the model, it can robustly predict thousands of voice characteristics while creating AI voices. This means the ElevenLabs model can take the text surrounding each generated utterance into account to maintain appropriate flow, rather than generating each utterance separately, which can create voices that sound robotic,” Staniszewski advised VentureBeat.

Widespread functions of text-to-speech device

Since its launch in beta, ElevenLabs has seen curiosity from each enterprises and creators and claims to have registered greater than one million customers worldwide. The newest launch is anticipated to not solely increase the person base of the platform but additionally the quantity of content material it generates every day.

“We have a number of enterprise clients using our products and their use cases are varied: from voicing characters in video games to voicing customer service avatars, and from recording audiobooks to creating content for the visually impaired,” Staniszewski defined.

Most just lately, the corporate collaborated with ArXiv to publish all their papers with an audio model for added accessibility. It additionally partnered with Storytel to reinforce the choices out there for audiobooks – providing further AI voices alongside human narrators. At some level sooner or later, the CEO expects it could additionally be capable to make dubbing a whole film into a number of languages utterly seamless, whereas preserving the accents and feelings of the unique actors.

More to return

As a part of this mission, ElevenLabs plans to increase its merchandise with extra languages and options, together with a tasks device that may make it simpler for customers to construction and edit their long-form content material. According to Staniszewski, it should add a “Google Docs” degree of simplicity to producing speech from lengthier content material.

“By the end of the year, we are also planning to release a beta version of our AI dubbing tool which will allow users to instantly convert speech from one language to another, all while preserving the original speakers’ voice,” he famous.

In this house of AI-powered voice and speech technology, ElevenLabs competes with gamers like MURF.AI, Play.ht and WellSaid Labs. According to Market US, the worldwide marketplace for such instruments stood at $1.2 billion in 2022 and is estimated to the touch almost $5 billion in 2032, with a CAGR of barely above 15.40%.

VentureBeat’s mission is to be a digital city sq. for technical decision-makers to achieve data about transformative enterprise expertise and transact. Discover our Briefings.

[ad_2]

Breaking down language partitions: ElevenLabs launches multilingual text-to-speech for various audiences

Event

Eleven Multilingual v2: How is it helpful?

Widespread functions of text-to-speech device

More to return

LEAVE A REPLY Cancel reply

ABOUT US

POPULAR POSTS

Propeller Design: Enhancing Performance, Efficiency, and Sustainability in Marine Propulsion

The AI Agent Revolution Is Here—And It’s Reshaping How We Work, Write, and Build

What I Learned About the Future of Health Insurance While Sitting in Harvard’s Cafeterias

POPULAR CATEGORY