Spatial Attention Module (ThunderNet)

Understanding the Spatial Attention Module (ThunderNet)

The Spatial Attention Module, also known as SAM, is a critical component of ThunderNet, an object detection feature extraction module. The SAM is designed to adjust the feature distribution of the feature map accurately by making use of knowledge from RPN. In this article, we will go over the math behind the SAM, its structure, and its functions in detail.

The Concept behind SAM

The SAM is a feature extraction module that shares knowledge with RPN to discern the foreground from the background features in the feature map. ThunderNet SAM modifies the feature map weight prior to the RoI mapping over the spatial dimensions. By using knowledge from RPN, it classifies foreground regions under the guidance of ground truths. Thus, the RPN's intermediate features can differentiate foreground features from background features. The SAM receives two inputs; the thin feature map $\mathcal{F}^{CEM}$ from the Context Enhancement Module and the intermediate feature map $\mathcal{F}^{RPN}$.

The SAM output $\mathcal{F}^{SAM}$ is represented as:

$\mathcal{F}^{SAM} = \mathcal{F}^{CEM} * \text{sigmoid}\left(\theta\left(\mathcal{F}^{RPN}\right)\right)$

Here ${\theta(·)}$ is a dimension transformation to match the number of channels in both feature maps. The sigmoid function constrains the values within the range of [0,1]. Finally, $\mathcal{F}^{CEM}$ is re-weighted by the generated feature map for better feature distribution. For computational efficiency, a 1×1 convolution is used as $θ(·)$, which makes the computational cost of CEM neglectable. The Figure to the right indicates the structure of SAM.

Structure of the SAM

The SAM is composed of two components to ensure drastic improvement in object detection. The first part allows for re-weighting of the feature maps effectively while the second part involves the stabilization of the RPN training. SAM is a critical component of ThunderNet that gives it an edge over other models, and its functions and components are discussed below.

Functions of the SAM

The SAM has two primary functions: refining the feature distribution by strengthening foreground features and suppressing background features, and stabilizing the RPN training by providing extra gradient flow from R-CNN to RPN. The first function strengthens foreground features and suppresses background features, which results in better performance during object detection. Also, as SAM modifies the feature distribution, it helps improve object detection accuracy significantly.

The second function of SAM is to stabilize the training of RPN by providing extra gradient flow from R-CNN subnet to RPN. As a result, RPN receives additional supervision from the R-CNN subnet, which helps the training of RPN. The added supervision by the R-CNN subnet significantly improves the accuracy of the object detection model.

The Spatial Attention Module has been an innovation in the object detection industry, contributing to the advancement of object detection models. The SAM has played a significant role in enhancing the feature distribution of feature maps that distinguish foreground features from background features effectively. It stabilizes RPN training by providing extra gradient flow from R-CNN subnet to RPN, ensuring higher levels of accuracy in object detection models. Thus, the SAM proves to be a vital component in ThunderNet's machine learning model.