Semantic Cross Attention

What is Semantic Cross Attention?

Semantic Cross Attention, or SCA, is a technique used in artificial intelligence models to improve the accuracy and efficiency of visual processing. It is based on the cross attention algorithm and involves restricting attention with respect to a semantically-defined mask. The goal of SCA is to either provide feature map information from a semantically restricted set of latents or allow a set of latents to retrieve information in a semantically restricted region of the feature map.

How Does SCA Work?

SCA is defined in terms of inputs, queries, keys, values, and internal attention dimension. Inputs, denoted as I₁, I₂, and I₃, represent the data being processed. Queries (Q), keys (K), and values (V) are matrices that map to the input data. A mask, I₃, is applied to the attention values, which are then normalized using the softmax operation. The resulting values are multiplied by V, resulting in the final output.

There are three types of SCA:

Types of SCA

(a) SCA with Pixels Attending Latents

SCA with pixels X attending latents Z is denoted as SCA(X, Z, S). In this type of SCA, a semantic mask is applied to force pixels from a semantic region to attend to latents that are associated with the same label. The weights W_Q, W_K, and W_V are defined as matrices with dimensions n x d_in and m x d_in, respectively.

(b) SCA with Latents Attending Pixels

SCA with latents Z attending pixels X is denoted as SCA(Z, X, S). In this type of SCA, a semantic mask is applied to enforce attention values between latents and corresponding pixels. The weights W_Q, W_K, and W_V are defined as matrices with dimensions m x d_in and n x d_in, respectively.

(c) SCA with Latents Attending Themselves

SCA with latents Z attending themselves is denoted as SCA(Z, Z, M). In this type of SCA, a semantic mask is applied to let latents only attend other latents that share the same semantic label. The weights W_Q, W_K, and W_V are defined as matrices with dimensions n x d_in.

Applications of SCA

The SCA algorithm has been used in a variety of applications, including image recognition, object detection, image captioning, and image segmentation. By restricting attention to a semantically-defined mask, SCA can improve the efficiency and accuracy of these tasks.

Semantic Cross Attention is an algorithm based on cross attention that involves restricting attention with respect to a semantically-defined mask. There are three types of SCA, each designed for specific applications. SCA has been used successfully in a variety of image processing tasks, improving the efficiency and accuracy of these models.