Noisy Student

Noisy Student Training is a method used in machine learning to improve the accuracy of image recognition models. It is a semi-supervised learning approach that combines self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. The training process involves a teacher model, a student model, and unlabeled images.

What is Noisy Student Training?

Noisy Student Training is a machine learning technique that seeks to improve on two existing methods: self-training and distillation. Self-training involves using a model to predict labels for unlabeled data and then retraining the model on the newly labeled data. Distillation involves training a smaller model to mimic the behavior of a larger model.

Noisy Student Training combines these two methods by training a teacher model on labeled images and then using the teacher to generate pseudo labels on unlabeled images. The student model is then trained on the combination of labeled and pseudo labeled images. The algorithm is iterated a few times by treating the student as a teacher to relabel the unlabeled data and training a new student.

The goal of Noisy Student Training is to create a larger student model than the teacher model, so that the student can learn from a larger dataset. Additionally, adding noise to the student model through input noise, such as data augmentation, and model noise, such as dropout and stochastic depth, forces the noised student to learn harder from the pseudo labels. The addition of noise improves the generalization of the model and prevents overfitting.

How Does Noisy Student Training Work?

Noisy Student Training has three main steps:

Train a teacher model on labeled images
Use the teacher to generate pseudo labels on unlabeled images
Train a student model on the combination of labeled images and pseudo labeled images

After the student model is trained, it is treated as a teacher to relabel the unlabeled data and train a new student. This process is iterated a few times to improve the accuracy of the model.

Noisy Student Training differs from self-training and distillation in two ways. First, it makes the student larger than, or at least equal to, the teacher model so that the student can learn from a larger dataset. Second, it adds noise to the student during training to improve generalization and prevent overfitting.

The noise added during training can be either input noise or model noise. Input noise is added to the data using data augmentation techniques such as RandAugment, which applies random transformations to the input data. Model noise is added to the model itself through techniques such as dropout, which randomly drops out nodes in the network during training, and stochastic depth, which randomly skips over layers in the network during training.

What Are the Benefits of Noisy Student Training?

Noisy Student Training has several benefits over existing techniques in machine learning. One of the main benefits is improved accuracy. By training a larger student model on a combination of labeled and pseudo labeled data, the model can achieve higher accuracy than models trained using self-training or distillation alone. Additionally, the addition of noise during training improves generalization and prevents overfitting, making the model more robust to new and unseen data.

Noisy Student Training is also a semi-supervised learning approach, which means it is more efficient to train than supervised learning approaches that require large volumes of labeled data. By using both labeled and unlabeled data, Noisy Student Training can achieve similar levels of accuracy while minimizing the need for labeled data.

Noisy Student Training is a machine learning technique that seeks to improve the accuracy of image recognition models. It combines self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. By using both labeled and unlabeled data, and adding noise during training, models trained using Noisy Student Training can achieve higher accuracy and better generalization than models trained using other techniques.