CBHG: A Building Block Used in Tacotron Text-to-Speech Model

CBHG, short for Convolutional Bank Highway Gated Recurrent Unit, is a building block used in the Tacotron text-to-speech model. The purpose of CBHG is to extract representations from sequences of input data, which can then be used to synthesize speech.

What is CBHG?

The CBHG module consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit (BiGRU). It is designed to model local and contextual information in input sequences, similar to modeling unigrams, bigrams, and up to K-grams. The convolutional filters model local information, while the highway networks and BiGRU model contextual information.

How Does CBHG Work?

The input sequence is first convolved with K sets of 1-D convolutional filters, where the k-th set contains Ck filters of width k (i.e. k = 1, 2, … , K). These filters model local information in the input sequence. The convolution outputs are stacked together and further max-pooled along time to increase local invariances. A stride of 1 is used to preserve the original time resolution.

The processed sequence is further passed to a few fixed-width 1-D convolutions, whose outputs are added with the original input sequence via residual connections. Batch normalization is used for all convolutional layers. The convolution outputs are fed into a multi-layer highway network to extract high-level features.

Finally, a bidirectional GRU RNN is stacked on top to extract sequential features from both forward and backward context. The output of the CBHG module is a sequence of high-level representations that can be used for speech synthesis.

Applications of CBHG

CBHG is mainly used as a building block in the Tacotron text-to-speech model. Tacotron is a machine learning model that can generate speech from text input. It is widely used in the field of speech synthesis, where it can be used to create natural-sounding speech for various applications, such as voice assistants or audiobooks.

CBHG can also be used in other natural language processing tasks, where sequence modeling is required, such as machine translation or text classification.

CBHG is a building block used in the Tacotron text-to-speech model to extract representations from sequences of input data. It consists of a bank of 1-D convolutional filters, followed by highway networks and a bidirectional gated recurrent unit. CBHG can be used in various natural language processing tasks, ranging from speech synthesis to machine translation or text classification.

Great! Next, complete checkout for full access to SERP AI.
Welcome back! You've successfully signed in.
You've successfully subscribed to SERP AI.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.