Online Normalization

Online Normalization is a technique used for training deep neural networks. In simple terms, it replaces arithmetic averages over the entire dataset with exponentially decaying averages of online samples. This helps in achieving a better convergence rate while training the neural network.

What is Online Normalization?

Online Normalization is a normalization technique that helps in training deep neural networks in a faster and more accurate manner. It replaces arithmetic averages over the full dataset with exponentially decaying averages of online samples. The decay factors for forward and backward passes are hyperparameters for this technique.

Instead of taking the average of the entire dataset, Online Normalization uses an ongoing process during the forward pass to estimate activation means and variances. It implements the standard online computation of mean and variance generalized to processing multi-value samples and exponential averaging of sample statistics. The resulting estimates directly lead to an affine normalization transform.

The algorithm also applies to the outputs of fully connected layers with only one scalar output per feature. In fact, this case simplifies to the running estimates of mean and variance across all samples.

How does Online Normalization work?

Online Normalization works by using an ongoing process during the forward pass to estimate activation means and variances. It implements the standard online computation of mean and variance generalized to processing multi-value samples and exponential averaging of sample statistics. The resulting estimates directly lead to an affine normalization transform.

The technique works by allowing incoming samples to have multiple scalar components and denote feature-wide mean and variance by $\mu\left(x\_{t}\right)$ and $\sigma^{2}\left(x\_{t}\right)$. Denote scalars $\mu\_{t}$ and $\sigma\_{t}$ to denote running estimates of mean and variance across all samples. The subscript $t$ denotes time steps corresponding to processing new incoming samples.

The formula for calculating Online Normalization is: $$ y\_{t} = \frac{x\_{t} - \mu\_{t-1}}{\sigma\_{t-1}} $$ $$ \mu\_{t} = \alpha\_{f}\mu\_{t-1} + \left(1-\alpha\_{f}\right)\mu\left(x\_{t}\right) $$ $$ \sigma^{2}\_{t} = \alpha\_{f}\sigma^{2}\_{t-1} + \left(1-\alpha\_{f}\right)\sigma^{2}\left(x\_{t}\right) + \alpha\_{f}\left(1-\alpha\_{f}\right)\left(\mu\left(x\_{t}\right) - \mu\_{t-1}\right)^{2} $$

Advantages of using Online Normalization

The advantages of using Online Normalization for deep neural network training are as follows:

It helps in achieving a better convergence rate while training the neural network.
It leads to faster convergence of training, which helps in saving time and resources.
It is applicable to many different neural network architectures and can be used with various datasets.
It provides more accurate and stable estimates of mean and variance, which leads to better performance of the network on unseen data.

Online Normalization is a technique used for training deep neural networks in a faster and more accurate manner. The technique replaces arithmetic averages over the full dataset with exponentially decaying averages of online samples. The resulting estimates directly lead to an affine normalization transform. This helps in achieving a better convergence rate while training the neural network. It is applicable to many different neural network architectures and can be used with various datasets. It provides more accurate and stable estimates of mean and variance, which leads to better performance of the network on unseen data.