SimCLRv2

SimCLRv2 is a powerful method for learning from few labeled examples while using a large amount of unlabeled data. It is a modification of SimCLR, a contrastive learning framework. SimCLRv2 has three major improvements that make it even better than SimCLR.

Larger ResNet Models

SimCLRv2 explores larger ResNet models to fully leverage the power of general pre-training. Unlike SimCLR and other previous work, SimCLRv2 trains models that are deeper but less wide. The largest model trained is a 152 layer ResNet with 3× wider channels and selective kernels (SK), a channel-wise attention mechanism that improves the parameter efficiency of the network. By scaling up the model from ResNet-50 to ResNet-152 (3×+SK), a 29% relative improvement is obtained in top-1 accuracy when fine-tuned on 1% of labeled examples. This means that the larger ResNet models used in SimCLRv2 can recognize images with greater accuracy than previous models.

Increased Capacity of Non-Linear Network

The non-linear network, also known as the projection head, is made deeper to increase its capacity. SimCLRv2 fine-tunes from a middle layer rather than throwing it away entirely after pre-training as SimCLR does. This change yields a significant improvement for both linear evaluation and fine-tuning with only a few labeled examples. By using a 3-layer projection head and fine-tuning from the 1st layer of projection head, it results in as much as a 14% relative improvement in top-1 accuracy when fine-tuned on 1% of labeled examples. This means that the non-linear network used in SimCLRv2 has greater capacity and can recognize images with greater accuracy.

Incorporating Memory Mechanism

SimCLRv2 incorporates the memory mechanism of MoCo v2, which designates a memory network whose output will be buffered as negative examples. Since training is based on large mini-batch which already supplies many contrasting negative examples, this change yields an improvement of ∼1% for linear evaluation as well as when fine-tuning on 1% of labeled examples. This means that SimCLRv2 is able to recognize images with greater accuracy due to its memory mechanism that uses negative examples.

In summary, SimCLRv2 is a semi-supervised learning method that is an improvement over SimCLR for recognizing images with greater accuracy. SimCLRv2 uses larger ResNet models, has an increased capacity non-linear network, and incorporates a memory mechanism to yield better results than previous models.