Bark AI: Text-to-Speech Artificial Intelligence Voice Cloning App & Text-Prompted Generative Audio

🎁

Get our BARK Text-to-Speech Model Free at the bottom of this post!

Bark is a revolutionary text-to-audio model created by Suno, based on the GPT-style models, which can generate highly realistic, multilingual speech as well as other audio — including music, background noise, and simple sound effects.

With Bark, users can also produce nonverbal communications like laughing, sighing, and crying, making it a versatile tool for a variety of applications.

Bark uses GPT-style models to generate speech with minimal tweaking, producing highly expressive and emotive voices that can capture nuances such as tone, pitch, and rhythm. It offers a fantastic experience that can leave you wondering if you’re listening to human beings.

Notably, Bark supports multiple languages and can generate speech in Mandarin, French, Italian, Spanish, and other languages with impressive clarity and accuracy.

With Bark, you can easily switch between languages and still enjoy high-quality sound effects.

Bark is not only intelligent but also intuitive, making it an ideal tool for individuals and businesses looking to create high-quality voice content for their platforms.

Whether you’re looking to create podcasts, audiobooks, video game sounds, or any other form of voice content, Bark has you covered.

How does it sound?

BARK Features

Similar to Vall-E and some other amazing work in the field, Bark uses GPT-style models to generate audio from scratch.

Different from Vall-E, the initial text prompt is embedded into high-level semantic tokens without the use of phonemes.

It can therefore generalize to arbitrary instructions beyond speech that occur in the training data, such as music lyrics, sound effects or other non-speech sounds.

A subsequent second model is used to convert the generated semantic tokens into audio codec tokens to generate the full waveform.

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio

# download and load all models
preload_models()

# generate audio from text
text_prompt = """
     Hello, my name is Suno. And, uh — and I like pizza. [laughs] 
     But I also have other interests such as playing tic tac toe.
"""
audio_array = generate_audio(text_prompt)

# play text in notebook
Audio(audio_array, rate=SAMPLE_RATE)

To save audio_array as a WAV file:

from scipy.io.wavfile import write as write_wav

write_wav("/path/to/audio.wav", SAMPLE_RATE, audio_array)

0:00/1×

Multilingual Support

Bark supports various languages out-of-the-box and automatically determines the language from input text.

This means that when prompted with code-switched text, Bark will attempt to employ the native accent for the respective languages. While English quality is currently the best, other languages are expected to further improve with scaling.

text_prompt = """
    Buenos días Miguel. Tu colega piensa que tu alemán es extremadamente malo. 
    But I suppose your english isn't terrible.
"""
audio_array = generate_audio(text_prompt)

0:00/1×

Music Generation

Bark can generate all types of audio, including music. In principle, Bark does not see a difference between speech and music. However, sometimes Bark chooses to generate text as music.

To help it out, users can add music notes around their lyrics.

text_prompt = """
    ♪ In the jungle, the mighty jungle, the lion barks tonight ♪
"""
audio_array = generate_audio(text_prompt)

0:00/1×

Voice/Audio Cloning

Bark has the capability to fully clone voices, including tone, pitch, emotion, and prosody. The model also attempts to preserve music, ambient noise, etc., from input audio.

To mitigate the misuse of this technology, audio history prompts are limited to a set of Suno-provided, fully synthetic options to choose from for each language.

😉

However, we jailbroke that for y'all in our release (below)

text_prompt = """
    I have a silky smooth voice, and today I will tell you about 
    the exercise regimen of the common sloth.
"""
audio_array = generate_audio(text_prompt, history_prompt="en_speaker_1")

0:00/1×

Note: since Bark recognizes languages automatically from input text, it is possible to use for example a german history prompt with english text. This usually leads to english audio with a german accent.

Speaker Prompts

Users can provide certain speaker prompts such as NARRATOR, MAN, WOMAN, etc. However, these prompts are not always respected, especially if a conflicting audio history prompt is given.

text_prompt = """
    WOMAN: I would like an oatmilk latte please.
    MAN: Wow, that's expensive!
"""
audio_array = generate_audio(text_prompt)

0:00/1×

Below is a list of some known non-speech sounds:

[laughter]
[laughs]
[sighs]
[music]
[gasps]
[clears throat]
— or … for hesitations
♪ for song lyrics
capitalization for emphasis of a word
MAN/WOMAN: for bias towards speaker

Languages Supported

Language	Status
English (en)	✅
German (de)	✅
Spanish (es)	✅
French (fr)	✅
Hindi (hi)	✅
Italian (it)	✅
Japanese (ja)	✅
Korean (ko)	✅
Polish (pl)	✅
Portuguese (pt)	✅
Russian (ru)	✅
Turkish (tr)	✅
Chinese, simplified (zh)	✅
Arabic	Coming soon!
Bengali	Coming soon!
Telugu	Coming soon!

BARK "SERPy" Release!

We’ve got some exciting news for you!

Remember Bark, the new Text2Speech model was released recently? 🐶🔊

Well, guess what? We’ve managed to reverse engineer it! 🕵️‍♂️🔧

Introducing Bark: Text2Speech Voice Cloning 🐶

We know that Bark’s creators restricted voice cloning and added “allowed prompts” for safety reasons.

But we believe in freedom and creativity! 🌟

✊ So, we’ve cracked open the code and removed those pesky limitations! 🚫🔓

Bark Unleashed! 🎉🐾

A set of easy-to-use Jupyter notebooks that’ll have you cloning audio with just 5–10 second samples of audio/text pairs in no time! 🎙️📝

Get ready to revolutionize your audio game with Bark Unleashed!

Just follow our simple instructions and let your imagination run wild! 🌈

Happy cloning, folks!

👇

Show some love with an ⬆️ upvote 🙏

👉

Get it FREE here