Non-Local Operation

Non-Local Operation is a component used in deep neural networks to capture long-range dependencies. This operation is useful for solving image, sequence, and video problems. It is a generalization of the classical non-local mean operation in computer vision.

What is Non-Local Operation?

Non-Local Operation is a type of operation for deep neural networks that captures long-range dependencies in the input feature maps. In simple words, it computes the response at a position as a weighted sum of the features at all positions in the input feature maps. It is different from a convolutional and recurrent operation since it considers all positions in the input. This operation is useful in solving image, sequence, and video problems.

How Does Non-Local Operation Work?

The non-local operation for deep neural networks can be defined as:

y_i = 1/C(x)∑_jf(x_i, x_j)g(x_j)

Here, y_i is the index of the output position (in space, time, or spacetime), while x is the input signal, which can be an image, sequence, or video. The pairwise function f computes a scalar that represents the relationship between i and all j. The unary function g computes a representation of the input signal at position j. The response is normalized by a factor C(x).

The non-local operation differs from the convolutional and recurrent operation, which consider only a local neighborhood or the latest time steps, respectively. The non-local operation also differs from the fully-connected (fc) layer because it computes responses based on relationships between different locations, unlike fc, which uses learned weights.

Why Use Non-Local Operation?

The non-local operation captures long-range dependencies in the input feature maps. It allows the neural network to take into account relationships between different locations in the input. This operation is useful for image, sequence, and video problems where information about distant regions is important. Using the non-local operation can improve the accuracy of the neural network in solving these types of problems.

Parameterization in Non-Local Operation

In Non-Local Operation, we usually parameterize g as a linear embedding of the form g(x_j) = W_gx_j. Parameterization of f involves an affinity function, which can be chosen from a list of affinity functions available. The parameterization of g can be implemented using 1x1 convolution in space or 1x1x1 convolution in spacetime.

Benefits of Non-Local Operation

Non-Local Operation provides several benefits. It supports inputs of variable sizes, maintaining the corresponding size in the output. In contrast, an fc layer requires fixed-size input/output and loses positional correspondence. Additionally, non-local operation can be added into the earlier part of deep neural networks, unlike fc layers that are usually used in the end. This allows richer hierarchy that combines both non-local and local information.

Applications of Non-Local Operation

Non-Local Operation is useful for image, sequence, and video problems where information about distant regions is important. It can be used in a wide range of applications, such as:

Object Recognition
Image Classification
Video Analysis
Autonomous Driving
Speech Recognition

Non-Local Operation is a flexible building block that can be easily used with convolutional and recurrent layers. This operation has been proven to improve the performance of neural networks in various applications.