Residual Block

The concept of Residual Blocks is a fundamental building block of deep learning neural networks. Introduced as part of the ResNet architecture, Residual Blocks provide an effective way to train deep neural networks.

What are Residual Blocks?

Residual Blocks are skip-connection blocks that learn residual functions with reference to the layer inputs instead of learning unreferenced functions. They let the stacked nonlinear layers fit another mapping of the input variable, denoted by ${x}$. The original mapping is recast into $\mathcal{F}({x})+{x}$, with $\mathcal{F}({x}):=\mathcal{H}({x})-{x}$. The intuition is that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping.

In simpler terms, Residual Blocks determine the difference between the input and the output at each stage in a neural network, instead of trying to learn the desired output directly. The idea is that it is easier to optimize the neural network if the expected output is incremental in relation to the input, rather than trying to learn the function from scratch.

Why are Residual Blocks important?

Training deep neural networks has always been a challenge due to the vanishing gradient problem. The deeper the network, the smaller the gradients become during backpropagation. This makes it more challenging to optimize the network's weights effectively, making it difficult to learn the correct output. The Residual Block approach enables the networks to learn in a more optimized manner, improving the training process of deep learning algorithms.

How do Residual Blocks work?

A Residual Block involves two main components: the identity mapping ${x}$ and the residual mapping $\mathcal{F}({x})$. The identity mapping feeds the input of the current layer through to the next layer when the network is set to its default mode with no special optimization.

The residual mapping is the remaining output of the block that transforms ${x}$ through the use of several convolutional layers. Once the residual mapping is generated, the processed input and residual mappings are summed, resulting in the final output for this Residual Block. This is the residual function that is the result of this Residual Block.

Bottleneck Residual Blocks

For deeper ResNets, such as ResNet-50 and ResNet-101, Bottleneck Residual Blocks are used, as these bottleneck blocks are less computationally intensive. A Bottleneck Residual Block provides an optimized way to create deep ResNets without increasing the network's complexity too much. The idea behind the bottleneck approach is to reduce the amount of computation required at the input and output of each block. This is achieved through the use of a smaller number of filters, while maintaining high dimensional features.

Residual Blocks have revolutionized the field of deep learning by enabling the successful training of deeper neural networks. They were introduced as a part of the ResNet architecture and have been adopted by numerous frameworks, including Keras and TensorFlow. Residual Blocks have proved to be an effective way of addressing the issue of vanishing gradients, which has limited the ability of deep neural networks to achieve state-of-the-art results. With the use of Residual Blocks, deeper neural networks can be easily trained, making it possible to solve more sophisticated problems with machine learning algorithms.