SAGAN Self-Attention Module

SAGAN Self-Attention Module: An Overview

The SAGAN Self-Attention Module is an essential aspect of the Self-Attention GAN architecture used for image synthesis. Self-Attention refers to the system's ability to attend to different parts of an image with varying degrees of focus. The SAGAN module allows the network to assign different weights to different regions of the input image and give more emphasis to non-local cues that may be essential in creating a particular image.

The Function of the SAGAN Self-Attention Module

In the SAGAN Self-Attention Module, image features from the previous hidden layer $\textbf{x} \in \mathbb{R}^{C\text{x}N}$ are transformed into two feature spaces, $\textbf{f}$ and $\textbf{g}$, to calculate the attention. The feature spaces, defined by matrices $\textbf{W}_{f}$ and $\textbf{W}_{g}$, respectively, are obtained by applying $1$x$1$ convolutions to the input feature map.

Using feature spaces $\textbf{f}$ and $\textbf{g}$, the attention score $\beta_{j, i}$ is calculated. The attention score determines the relative degree of emphasis needed on different regions of the image when synthesizing a specific part. $\beta_{j, i}$ is calculated using the formula:

$$\beta_{j, i} = \frac{\exp\left(s_{ij}\right)}{\sum^{N}_{i=1}\exp\left(s_{ij}\right)} $$

The output of the attention layer is $\textbf{o} = \left(\textbf{o}_{1}, \textbf{o}_{2}, \ldots, \textbf{o}_{j} , \ldots, \textbf{o}_{N}\right) \in \mathbb{R}^{C\text{x}N}$. Here, $\textbf{o}_j$ is the output for the $j$th region of the image, and it is obtained as:

$$ \textbf{o}_{j} = \textbf{v}\left(\sum^{N}_{i=1}\beta_{j, i}\textbf{h}\left(\textbf{x}_{i}\right)\right) $$

The value of $\textbf{v}\left(\textbf{x}_{i}\right)$ refers to the importance weight given to each part of the image, and it is obtained by multiplying the input feature map location $\textbf{x}_{i}$ by a learned matrix $\textbf{W}_{v}$. Similarly, $\textbf{h}\left(\textbf{x}_{i}\right)$ measures the importance of each location and is calculated by matrix multiplication with learned matrix $\textbf{W}_{h}$.

The final output of the SAGAN Self-Attention Module is obtained by multiplying the output of the attention layer by a scalar parameter $\gamma$ and then adding it back to the input feature map. This formulation is given by:

$$\textbf{y}_{i} = \gamma\textbf{o}_{i} + \textbf{x}_{i} $$

The introduced scalar parameter $\gamma$ allows the network to first rely on the cues in the local neighborhood – since this is easier – and then gradually learn to assign more weight to the non-local evidence.

Motivation for the Study of Self-Attention in GANs

Generative Adversarial Networks (GANs) are neural networks that can generate data that is similar to sample data. The Self-Attention GAN (SAGAN) is a GAN that uses a self-attention module to enhance the quality of the generated images. SAGAN improves upon previous GAN models by introducing a learned mechanism for capturing long-range dependencies between image regions. The main motivation for the study of self-attention in GANs is to allow the network to better capture the global structure of images by attending to salient regions and learning to complete some of the global attributes necessary for a successful image synthesis.

The SAGAN Self-Attention Module is an essential aspect of the Self-Attention GAN architecture used for image synthesis. It improves image quality by attending to salient regions and learning to capture the long-range dependencies to complete global attributes necessary in producing realistic images. More research into this area may further improve image quality and provide better ways of analyzing and understanding image data.