Coordinate attention

Coordinate attention is a novel attention mechanism proposed by Hou et al. that has gained attention for its ability to embed positional information into channel attention. This mechanism enables the network to focus on large, significant regions at a low computational cost.

What is Coordinate Attention?

The coordinate attention mechanism is a two-step process that involves coordinate information embedding and coordinate attention generation. The first step entails two spatial extents of pooling kernels encoding each channel horizontally and vertically. A shared $1\times 1$ convolutional transformation function is then applied to the concatenated outputs of the two pooling layers in the second step. The resulting tensor is split into two separate tensors, yielding attention vectors with the same number of channels for the horizontal and vertical coordinates of the input $X$ along.

This approach allows the network to accurately obtain the position of a targeted object. It has a larger receptive field than BAM and CBAM and models cross-channel relationships effectively, enhancing the expressive power of learned features. Due to its lightweight design and flexibility, it can be easily used in classical building blocks of mobile networks.

How Does Coordinate Attention Work?

The coordinate attention mechanism has two main steps, as previously mentioned. The first step is coordinate information embedding, whereby two spatial extents of pooling kernels encode each channel horizontally and vertically, producing two tensors $z^h$ and $z^w$ as follows:

\begin{align} z^h &= \text{GAP}^h(X) \end{align} \begin{align} z^w &= \text{GAP}^w(X) \end{align}

The second step is coordinate attention generation, which involves applying the shared $1\times 1$ convolutional transformation function to the concatenated output of the two pooling layers. The resulting tensor is then split into two separate tensors, which yields attention vectors with the same number of channels for the horizontal and vertical coordinates of the input $X$ along. The following are the equations for coordinate attention generation:

\begin{align} f &= \delta(\text{BN}(\text{Conv}_1^{1\times 1}([z^h;z^w]))) \\ f^h, f^w &= \text{Split}(f) \\ s^h &= \sigma(\text{Conv}_h^{1\times 1}(f^h)) \\ s^w &= \sigma(\text{Conv}_w^{1\times 1}(f^w)) \\ Y &= X s^h  s^w \end{align}

Where $\text{GAP}^h$ and $\text{GAP}^w$ are pooling functions for vertical and horizontal coordinates, respectively, and $s^h \in \mathbb{R}^{C\times 1\times W}$ and $s^w \in \mathbb{R}^{C\times H\times 1}$ denote the corresponding attention weights.

Benefits of Coordinate Attention

Coordinate attention provides a number of benefits, including its ability to accurately identify the position of a targeted object, its larger receptive field compared to BAM and CBAM, and its effective modeling of cross-channel relationships to enhance the expressive power of learned features. Additionally, this approach is lightweight, flexible, and can be easily incorporated into classical building blocks of mobile networks.

Overall, coordinate attention is a novel attention mechanism that has gained attention for its ability to embed positional information into channel attention. Its two-step process and lightweight design make it a promising approach for enhancing the expressive power of learned features in mobile networks, while also reducing computational costs.

Great! Next, complete checkout for full access to SERP AI.
Welcome back! You've successfully signed in.
You've successfully subscribed to SERP AI.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.