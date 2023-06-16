Meta's New Voicebox AI Serves Up Generative Text-To-Speech

Meta has introduced its own generative AI model, but instead of creating images like Dall-E or writing answers like ChatGPT, this one focuses on audio generation. Named "Voicebox," Meta's AI tool can instantly generate human-like audio clips. Then, it goes a step ahead and offers capabilities like noise removal and language translation in six dialects.

One of the most impressive abilities of Voicebox is clearing noise from an audio clip. For example, if an audio clip in which a person can be heard speaking is polluted by a car horn, the AI model removes the noise and returns almost crystal-clear audio. It's almost like Google's Magic Eraser tool, which removes unwanted objects from a photo, and then performs intelligent pixel-filling so that the removed elements blend seamlessly with the surroundings.

Voicebox can also perform multi-language speech sampling, and currently offers support for English, French, German, Spanish, Polish, and Portuguese. Thanks to its linguistic chops, Voicebox can return an audio clip in the preferred language, even if the text input is in another language. This could come in handy for conversations where language barriers exist.

Google already offers this convenience right in your ears if you own one of the recent Pixel Buds TWS earbuds and a Pixel phone. Meta has done remarkable work in this field too, thanks to its own Massively Multilingual Speech AI research models that can understand over 4,000 spoken languages from all over the world.