Introduction to NesT

NesT is a neural network architecture that is used for image recognition tasks. It has gained a lot of popularity due to its superior performance compared to other state-of-the-art networks such as ResNet and VGG. NesT stands for Nested Scale-Transformers, and it is built using a combination of transformer layers and "nesting" hierarchies.

How NesT Works

One of the unique features of NesT is that it conducts local self-attention on every image block independently, and then "nests" them hierarchically. This means that instead of processing the entire image at once, it breaks it down into smaller blocks and works on them individually. Coupling of processed information between spatially adjacent blocks is achieved through a proposed block aggregation between every two hierarchies. The overall hierarchical structure can be determined by two key hyper-parameters: patch size $S × S$ and number of block hierarchies $T_d$.

How NesT Layers Work

The layers used in NesT consist of canonical transformer layers. Each transformer layer is composed of a multi-head self-attention (MSA) layer followed by a feed-forward fully-connected network (FFN) with skip-connection and Layer normalization. Positional embeddings are added to encode spatial information before feeding into the block. Finally, a nested hierarchy with block aggregation is built -- every four spatially connected blocks are merged into one. All blocks inside each hierarchy share one set of parameters.

Training and Performance

When training NesT, each image is linearly projected to an embedding. All embeddings are partitioned to blocks and flattened to generate final input. The network is trained using backpropagation with a loss function that depends on the task at hand.

NesT has been shown to outperform other state-of-the-art networks such as ResNet and VGG on a variety of classification tasks, including the popular ImageNet dataset. It has achieved state-of-the-art performance on image classification, object detection, and segmentation tasks.

NesT is a powerful neural network architecture that has shown to be superior to other state-of-the-art networks. It is built using a combination of transformer layers and "nesting" hierarchies, allowing it to conduct local self-attention on every image block independently. This approach has proven to be very effective on a variety of image recognition tasks, achieving state-of-the-art results on datasets such as ImageNet.

Great! Next, complete checkout for full access to SERP AI.
Welcome back! You've successfully signed in.
You've successfully subscribed to SERP AI.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info has been updated.
Your billing was not updated.