Axial Attention

Axial Attention is a type of self-attention that is used in high-dimensional data tensors such as those found in image segmentation and protein sequence interpretation. It builds upon the concept of criss-cross attention, which harvests contextual information from all pixels on its criss-cross path in order to capture full-image dependencies. Axial Attention extends this idea to process multi-dimensional data in a way that aligns with the tensors' dimensions.

History and Development

The idea of axial attention was first proposed in 2019 in a paper called CCNet by a group of researchers at Chinese University of Hong Kong. This paper introduced criss-cross attention, which was the starting point for axial attention. Criss-cross attention was designed for semantic segmentation of images and it allowed each pixel to access contextual information from all pixels on its criss-cross path. By using a recurrent operation, each pixel could then capture the full-image dependencies.

In December 2019, Jonathan Ho and his team proposed an extension to CCNet that allowed it to handle multi-dimensional data. This extension used a structure that allowed most of the context to be computed in parallel during decoding without any independence assumptions. Axial attention was thus born and it became the building block for developing self-attention-based autoregressive models for high-dimensional data tensors, such as Axial Transformers.

Applications of Axial Attention

Axial attention has been used in a variety of applications, with two notable examples being image segmentation and protein sequence interpretation. In image segmentation, axial attention has been used to extract contextual information from high-dimensional data and to improve the accuracy of image segmentation algorithms. This is because axial attention allows each pixel to access more information about its surroundings, resulting in more accurate segmentation.

With regard to protein sequence interpretation, axial attention has been used in a recent paper called AlphaFold by a team of researchers from DeepMind. AlphaFold is a system that predicts the 3D structure of proteins from their amino acid sequences. It relies on neural networks to learn the underlying physical rules that govern protein folding. Axial attention was used in AlphaFold to capture the long-range dependencies between amino acids in a protein sequence, which helped to improve the accuracy of the predicted protein structures.

Axial attention is a type of self-attention that has proved useful in a variety of applications, including image segmentation and protein sequence interpretation. It is an extension of criss-cross attention that allows it to process multi-dimensional data in a way that aligns with the tensors' dimensions. This ability to access contextual information from all pixels on its criss-cross path allows each pixel to capture the full-image dependencies, resulting in more accurate segmentation and protein structure prediction. As such, axial attention is an important tool for anyone working with high-dimensional data tensors.