Window-based Discriminator

Overview of Window-based Discriminator

Window-based Discriminator is a type of discriminator for generative adversarial networks that is designed to classify between distributions of small audio chunks. This method is analogous to a PatchGAN but is specifically created for audio. The aim of a window-based discriminator is to maintain coherence of audio signal across patches. In this article we will discuss what is a discriminator, what is a generative adversarial network, how a window-based discriminator works, and its benefits.

Discriminator and Generative Adversarial Network

A discriminator is a deep neural network that learns to distinguish between real and fake data. During training, it is trained with both real and fake data. In simple terms, it is like an evaluator in a game, which tells if it is a real or fake data. The role of the discriminator is to prevent the generator from producing data that is too similar to the real data, hence encouraging the generator to create realistic data.

A Generative Adversarial Network (GAN) consists of two deep neural networks – a generator and a discriminator. The generator creates random samples of data (for example, images, videos, or audio), while the discriminator reviews the samples created by the generator and tries to differentiate them from the real samples. The generator continuously adapts itself to produce better samples that can deceive the discriminator. In the end, the generator produces samples that are hard to differentiate from the real ones.

What is a Window-based Discriminator?

A window-based discriminator is like a patch-based discriminator designed to classify between distribution of small audio chunks. While a standard GAN discriminator learns to classify between distributions of entire audio sequences, window-based discriminator learns to classify between distribution of small audio patches. In this way, the model learns to maintain coherence across patches.

While learning to distinguish real versus fake data, the discriminator continuously updates its parameters to reproduce the best performance. During the training of the window-based discriminator, instead of learning to classify entire speech sequences, it learns to classify small audio patch distributions. As a result, the model can capture short term dependencies that are missed by traditional GANs. This results in better quality of audio synthesis.

Benefits of Window-based Discriminator

Here are some benefits of using a window-based discriminator:

Short-term Dependency Capture: The window-based discriminator captures short-term dependencies, whereas traditional GANs cannot.
Limited Computational Requirements: Since only small audio chunks are used to train the model, the computational load is lighter than the standard GAN discriminator.
No Need for Audio Alignment: A window-based discriminator does not require aligning the audio segments, so it is easier and faster to work with. Traditional GAN requires aligning the audio segments before fitting them to the model.

Window-based Discriminator is a method for generating realistic audio that captures short-term dependencies. Being analogous to PatchGAN, it learns to classify small audio patches instead of entire sequences. As a result, it is computationally lighter and does not require aligning audio segments, making the method a preferred choice for generating high-quality audio.