Dynamic Convolution

Dynamic convolution is a novel operator design that increases the representational power of lightweight CNNs, without increasing their computational cost or altering their depth or width. Developed by Chen et al., dynamic convolution uses multiple parallel convolution kernels, with the same size and input/output dimensions, in place of a single kernel per layer.

How dynamic convolution works

The different convolution kernels in dynamic convolution are generated attention weights through a squeeze-and-excitation mechanism, similar to SE blocks. These kernels are then combined dynamically by a weighted summation and applied to the input feature map X. The convolutional kernels' weights and biases are summed up to combine the convolutions.

The following mathematical expressions depict how dynamic convolution works:

s = softmax(W₂δ(W₁GAP(X)))
The attention weights are generated by applying the global average pooling (GAP) function to the input feature map X, then passing the result through a dense layer with the weights parameterized by W₁. The values are transformed post-GAP through delta function. The attention weights are computed by a second dense layer with parameters W₂ and activated by a softmax function.
DyConv = ∑_i=1^K skConvk
The $K$ convolution kernels generated from the previous step are combined with weight and biases aggregated by a weighted summation with the attention weights s to produce a dynamic convolution DyConv.
Y = DyConv(X)
The dynamic convolution DyConv is applied to the input feature map X to produce the output Y.

Advantages of using dynamic convolution

Dynamic convolution can be efficiently applied to improve the representational power of lightweight CNNs while keeping computational cost very low. Unlike traditional convolution, which applies a single kernel to the input feature map, dynamic convolution uses multiple parallel convolution kernels. Dynamic convolution enhances the network's ability to capture more nuanced features in data, which increases the potentially relevant information captured by the network.

Dynamic convolution's ability to increase the representational power of the network implies that deeper and wider networks might not be necessary to achieve the same accuracy as would typically require a deeper and more complex network. Dynamic convolution can help reduce the cost of computation, offering an efficient solution for efficient and accurate neural network design.

Comparing Dynamic Convolution with Other Techniques

Dynamic convolution compares favorably with other techniques, like CondConv, which perform similar functions. Whereas CondConv creates different convolutional kernels using multiple groups, Dynamic convolution uses a single group of parallel convolutional kernels. The use of a single group means better parameter sharing across the networks since all convolutional kernels have the same size and input/output dimensions.

Limitations of Dynamic Convolution

Dynamic convolution is not a universal solution for all image classification problems. For example, the technique might not be optimal for problems with complex datasets or non-linear decision boundaries. This is because Dynamic convolution is not especially suited to capturing complex features within datasets - this is something that more complex neural networks might be better suited to handling.

Overall, dynamic convolution is a novel operator design that's effective for increasing the representational power of lightweight neural networks. This technique offers a simple and efficient way to improve model accuracy and reduce computational cost. While it's not the optimal solution for all image classification problems, dynamic convolution offers an excellent alternative to other more complex approaches.