Conditional Batch Normalization

Conditional Batch Normalization (CBN) is a variation of batch normalization that allows for the manipulation of entire feature maps using an embedding. In CBN, the scaling parameters for batch normalization, $\gamma$ and $\beta$, are predicted from an embedding, such as a language embedding in VQA. This allows the linguistic embedding to manipulate the entire feature map by scaling them up or down, negating them, or shutting them off. CBN has also been used in GANs to allow class information to affect the batch normalization parameters. In this article, we will discuss the details of Conditional Batch Normalization and how it works in the training process.

How CBN Works

Consider a single convolutional layer with batch normalization module $\text{BN}\left(F\_{i,c,h,w}|\gamma\_{c}, \beta\_{c}\right)$, where $\gamma$ and $\beta$ are pretrained scale and shift parameters, respectively. The goal of CBN is to predict these parameters directly from some embedding $\mathbf{e\_{q}}$ instead of using the pretrained values.

The authors propose predicting a change in the frozen original scalars, $\delta\beta\_{c}$ and $\delta\gamma\_{c}$, for which it is straightforward to initialize a neural network to produce an output with zero-mean and small variance. A one-hidden-layer MLP is used to predict these deltas for all feature maps within the layer:

$$\Delta\beta = \text{MLP}\left(\mathbf{e\_{q}}\right)$$ $$\Delta\gamma = \text{MLP}\left(\mathbf{e\_{q}}\right)$$

The MLP outputs a vector of size $C$ for a feature map with $C$ channels. These predictions are added to the frozen original scalars, to get updated scaling and shifting parameters:

$$ \hat{\beta}\_{c} = \beta\_{c} + \Delta\beta\_{c} $$ $$ \hat{\gamma}\_{c} = \gamma\_{c} + \Delta\gamma\_{c} $$

Finally, these updated parameters are used as the parameters for the batch normalization:

$$\text{BN}\left(F\_{i,c,h,w}|\hat{\gamma\_{c}}, \hat{\beta\_{c}}\right)$$

The authors freeze all ResNet parameters, including $\gamma$ and $\beta$, during training. For ResNet, CBN is applied to the three convolutional layers in each block of the four stages of computation.

Advantages of CBN

One of the main advantages of CBN is that it allows for class-conditional normalization by using an embedding to predict the scaling and shifting parameters. It can be used in various tasks such as image classification, object detection, and visual question answering. CBN can also improve performance when there are class-specific variations in the data distribution. Another advantage of CBN is that it avoids the need to fine-tune the entire model when there are new classes to be added to the dataset. Instead, only the embedding and MLP need to be fine-tuned.

Disadvantages of CBN

One of the main disadvantages of CBN is that it is computationally expensive since it requires an MLP for each feature map. This can increase the training time significantly, especially for larger models. Additionally, CBN requires a larger embedding space to capture more complex relationships between the class label and the scaling and shifting parameters. This can lead to higher memory usage during training.

Conditional Batch Normalization is a powerful tool for class-conditional normalization in various computer vision tasks. While it has some disadvantages, such as increased computational cost and memory usage, its advantages are numerous, including improved performance and the ability to avoid the need to fine-tune the entire model for new classes. Future research can focus on addressing the computational and memory limitations of CBN, as well as exploring its applications in new domains.