Seesaw Loss

Understanding Seesaw Loss: A Dynamic Loss Function for Long-Tailed Instance Segmentation

Instance segmentation is a crucial task in computer vision that involves labeling each pixel of an image with an object entity. This task has several applications in real-life scenarios, such as autonomous driving, robotics, and medical imaging. However, a major challenge in instance segmentation is the unbalanced distribution of objects in the real world. Some classes have an abundance of instances, while others have considerably fewer examples, making the learning process biased towards the dominant classes. This issue is known as long-tailed distribution, and it can cause a significant drop in performance for tail classes, leading to unreliable and inaccurate predictions.

To address this challenge, a team of researchers from the University of Chinese Academy of Sciences and Tsinghua University, China, proposed a dynamic loss function called Seesaw Loss, that re-balances the gradients of positive and negative samples on tail classes. Seesaw Loss aims to mitigate the overwhelming punishments on tail classes while compensating for the risk of misclassification caused by diminished penalties.

What is Seesaw Loss, and why is it important?

Seesaw Loss is a dynamic loss function that aims to provide a balance between the impact of tail classes and head classes in long-tailed instance segmentation. The function uses two complementary factors, the mitigation factor and the compensation factor. The mitigation factor reduces the penalty of tail categories and adjusts the gradient of their negative samples depending on the ratio of cumulative training instances between different classes. Meanwhile, the compensation factor increases the penalty of misclassified instances to avoid false positives of tail categories.

The synergy between the two factors enables Seesaw Loss to address the limitations of existing loss functions that over-penalize tail classes with limited instances, leading to sub-optimal predictions. The function adjusts the weights of each class by a tunable balancing factor, which ensures that the optimization process does not favor the dominant classes.

How Does Seesaw Loss Work?

Seesaw loss is a modification of the commonly used cross-entropy loss function that is prevalent in instance segmentation tasks. Seesaw Loss takes as input the logits z_i and the ground-truth labels y_i and computes the probability of class i using the softmax function. The softmax function computes the probability for all classes on the basis of how well a pixel fits each class.

$$ L\_{seesaw}\left(\mathbf{x}\right) = - \sum^{C}\_{i=1}y\_{i}\log\left(\hat{\sigma}\_{i}\right) $$ $$ \text{with } \hat{\sigma\_{i}} = \frac{e^{z\_{i}}}{- \sum^{C}\_{j\neq{1}}\mathcal{S}\_{ij}e^{z\_{j}}+e^{z\_{i}} } $$

The tuning factor in Seesaw loss is $\mathcal{S}\_{ij}$, which determines the impact of class i on class j. The function determines $\mathcal{S}\_{ij}$ by combining two terms: The mitigation factor, $\mathcal{M}\_{ij}$ and the compensation factor, $\mathcal{C}\_{ij}$.

$$ \mathcal{S}\_{ij} =\mathcal{M}\_{ij} · \mathcal{C}\_{ij} $$

The mitigation factor $\mathcal{M}\_{ij}$ adjusts the gradient of negative samples for class j based on the ratio of cumulative training instances between tail class $j$ and head class $i$. The goal of the mitigation factor is to reduce the penalty on tail classes that have few training instances, as they tend to overfit to noise or underfit for small details. Additionally, the mitigation factor prevents catastrophic forgetting of previously learned knowledge.

The compensation factor $\mathcal{C}\_{ij}$ increases the penalty of class j whenever an instance of class i is misclassified to class j. The aim is to compensate for the risk of misclassification caused by diminished penalties. The compensation factor encourages the model to learn the features unique to each class, preventing the model from being overconfident in head classes or mistaking tail classes for other categories.

Key Features of Seesaw Loss

The Seesaw loss function stands out from other loss functions because of its unique features:

Dynamic balancing: Seesaw loss dynamically adapts to the distribution of instances between different categories, which are continually changing during the training process. The function uses the ratio of instances between the head and tail classes to control the rate of penalties on the tail classes.
Compensation mechanism: Seesaw loss ensures that the cost of false negatives is reduced simultaneously with the cost of false positives. The compensation mechanism encourages the model to learn the features unique to each instance, improving prediction accuracy across all classes.
Tackles the Problem of Sub-Optimal Performance: Seesaw loss addresses the significant issue of sub-optimal performance for long-tail datasets, which tend to suffer from under-representation of the low-frequency classes.

Conclusions

Seesaw Loss is a dynamic loss function that addresses the limitations of cross-entropy loss for long-tailed instance segmentation tasks. The function is designed to mitigate the overwhelming penalties on tail classes while compensating for the risk of misclassification caused by diminished penalties. Seesaw Loss dynamically adapts to the distribution of instances between head and tail classes, ensuring that the optimization does not favor the dominant classes. The compensation mechanism encourages the model to learn the features unique to each class, and it reduces the cost of false negatives and false positives.

Seesaw loss provides a solution to a pervasive problem in instance segmentation and can significantly improve prediction accuracy on long-tailed datasets. The proposed approach can be extended to various other machine learning applications and can have a significant impact on real-life scenarios.