Instance-Level Meta Normalization

Instance-Level Meta Normalization: A Solution for Learning-to-Normalize Problem

In the world of computer vision and artificial intelligence, normalization techniques have always been a crucial step in the training of neural networks for image recognition tasks. Normalization is the process of scaling and shifting the values of an input dataset to make them suitable for the machine learning algorithms. One such method is the Instance-Level Meta Normalization (ILM-Norm) that can predict normalization parameters through both forward and backward paths.

Normalization Techniques in Machine Learning

Normalization refers to a bunch of techniques employed in machine learning to transform input data for neural networks so that it can be analyzed better. Normalization is not data pre-processing as it is done to the training set and the test set separately to maintain the randomness of the distribution of data. A normalization technique scales and shifts the input data from their original measurements and makes it much easier to train with that data. Some of the widely used normalization techniques are:

Batch normalization
Group normalization
Instance normalization
Layer normalization

These types of normalizations are used to tackle problems due to dataset variance, overfitting, or vanishing gradients. But they each have their drawbacks that can make them less effective in some scenarios. That's where Instance-Level Meta Normalization comes in.

What is Instance-Level Meta Normalization (ILM-Norm)?

Instance-Level Meta Normalization (ILM-Norm) is a normalization technique that addresses a problem called learning-to-normalize. This technique builds an auto-encoder that predicts the normalization parameters $\omega$ and $\beta$ as scaling and shifting parameters for recovering the original distribution of the input tensor $x$.

ILM-Norm learns to predict the normalization parameters through both forward and backward paths. This technique primarily focuses on the prediction of the rescaling parameters that are required to restore the original distribution of the feature maps. Unlike other normalization techniques, ILM-Norm uses the mean and variance of $x$ to characterize its statistics instead of using the entire tensor $x$ as input for the auto-encoder.

Learning-to-Normalize Problem

The learning-to-normalize problem arises when a deep neural network fails to effectively normalize the input values or feature maps. This problem greatly affects the performance of the model by slowing down the convergence and adversely impacting the test accuracy of the model.

Moreover, normalization techniques that are designed to tackle this issue often focus on providing one set of normalization parameters for the entire batch of training samples. However, the distribution of features can vary significantly within each batch. To address this problem, ILM-Norm generates a unique set of normalization parameters for each sample (instance) instead of providing a single set of parameters for the entire batch.

Working of Instance-Level Meta Normalization (ILM-Norm)

The key idea behind Instance-Level Meta Normalization (ILM-Norm) is to use an encoder-decoder structure to obtain the normalization parameters $\omega$ and $\beta$. The encoder and decoder are trained on the mini-batch data input and output, respectively. The training parameters of the encoder are used to compute the normalization parameters for the input data. The parameter values are then used to shape the inputs of the decoder. The decoder function gives the normalized output.

In other normalization techniques, the normalization parameter is given by $\gamma$ and $\beta$: \begin{equation} \text{BN}(x; \gamma, \beta)\frac{x-\mu(x)}{\sigma(x)}+\beta \end{equation} where $\gamma$ and $\beta$ are trainable parameters, and $\mu(x)$ and $\sigma(x)$ are the mean and variance of tensor $x$, respectively.

ILM-Norm, on the other hand, uses a parameter set $\theta$ and \begin{equation} \text{I-LM}(x; \theta) = \frac{x-\mu(x)}{\text{var}(x)/\theta}+ \end{equation} where $\mu(x)$ and $\text{var}(x)$ are the mean and variance of tensor $x$. In this setup, $\theta$ depends on the information provided by other samples in the same training batch, and each sample gets its instance-dependent $\theta$.

To improve overall performance, ILM-Norm predicts the instance-specific $\theta$ through end-to-end training using both forward and backward propagation paths. In the forward path, the input tensor $x$ is subjected to normalization with the instance-dependent $\theta$. In the backward path, a reconstruction loss is propagated through the entire network, acting as a feedback mechanism that updates the parameter $\theta$.

Advantages of Instance-Level Meta Normalization (ILM-Norm)

Instance-Level Meta Normalization has several advantages over other normalization techniques. Some of them are:

Instance Dependent: ILM-Norm generates a unique set of normalization parameters for each sample (instance) instead of providing a single set of parameters for the entire batch.
Improved Model Accuracy: ILM-Norm uses instance-level normalization, which results in a better convergence rate and performance accuracy of the deep neural network. It provides an overall better test accuracy than other normalization methods.
Faster Convergence: The normalization technique implemented by ILM-Norm is robust against covariate shift resulting in an improved convergence rate of the deep neural network.
Less Data Dependence: Other normalization techniques like Batch Normalization depend on the acquired data batch. In comparison, ILM-Norm is less data-dependent as it uses instance-level normalization that can adapt to the variation in data distribution.

Instance-Level Meta Normalization (ILM-Norm) is an effective approach to tackle normalization issues found in deep neural networks. ILM-Norm uses instance-level normalization that improves overall performance and is less data-dependent. By training an auto-encoder that predicts the normalization parameters through both forward and backward paths, it provides unique normalization parameters for each sample (instance), which results in better convergence rates and overall accuracy. This technique has shown promising results when compared to other well-known normalization techniques.