YOLOX

YOLOX is an object detector that has been making several modifications to YOLOv3 with a DarkNet53 backbone. This modified detector has been altered for better performance by replacing the head with a decoupled one, reducing feature channel and adding two parallel branches. Moreover, it has added Mosaic and MixUp into the augmentation strategies to enhance performance. This article will explore further the modifications of the YOLOX detector alongside its features.

YOLOX Features

The YOLOX detector is considered as one of the best anchor-free single-stage detectors. It can specifically detect objects on different scales and provides a robust performance. YOLOX model achieves remarkable performance surpassing previous works in many aspects in a real-time object detection task. The features that make the YOLOX detector remarkable are:

Decoupled Head

The first modification YOLOX has made is the replacing of YOLO's head with a decoupled one. This modification has enhanced an object in two ways. First, it has increased the number of features that the detector can extract, leading to better object detection accuracy. Second, it reduces the time complexity of the detection process while increasing its accuracy. Therefore, the decoupled head helps to reduce the processing time while maintaining high performance.

Mosaic and MixUp Data Augmentation

Another feature of YOLOX is the adoption of mosaic and mixup data augmentation strategies. Both techniques are effective in increasing model performance. However, mosaic augments images by stitching them together into one, whereas mixup data augmentation works by blending different image data during training. This approach is helpful in synthesizing more data, which helps the model to achieve better accuracy.

Anchor-Free Mechanism

YOLOX has removed the anchor mechanism, which means that it is now anchor-free. It is essential to note that the anchor mechanism is a fixed reference pattern that the detector uses to predict the position of an object. However, sometimes, the anchor box fails to capture an object in its entirety, which results in incomplete object detection. YOLOX’s anchor-free mechanism helps to detect objects accurately by removing the fixed reference pattern.

SimOTA for Label Assignment

Lastly, the simulated optimal transport assignment (SimOTA) labels objects by optimizing the placement of these objects in different areas of an image. This feature is adopted majorly to tackle the problem of multiple labels assigned to one object or multiple objects in a classification analysis. The SimOTA uses the top-k strategy for label assignment to improve the accuracy of object detection while preventing false detections.

YOLOX Modifications

The modifications involved in YOLOX have been termed as remarkable improvements to its performance with regards to previous models. The modifications are:

FPN-based Head

YOLOX replaced its previous head with a feature pyramid network (FPN)-based head. This modification helps to bring out the feature pyramid structure that YOLOX uses to detect objects on different scales, including smaller objects. Thus, this modification helps to improve the accuracy and the efficiency of the detector.

Decoupled Head

As earlier mentioned, YOLOX added a decoupled head. This modification is essential for improving the accuracy and reducing the processing time. YOLOX’s decoupled head allows it to retain the excellent detection performance for YOLOv3, while reducing the post-processing cost by half. It is at the heart of YOLOX, as it can separate the category-specific models from the detectors, leading to great flexibility and efficiency.

Feature SIAM

This modification is referred to as the Feature SIAM (SEPA-DECOUPE). It is a feature segmentation that is used in YOLOX. This feature helps to optimize the use of feature maps for both classification and regression tasks. The SIAM feature consists of a classification feature and a regression feature that functions together to ensure that the detector’s precision and recall are both enhanced.

Mixup and Mosaic Augmentation

Mosaic and Mixup augmentation have been integrated into the YOLOX detector. With these augmentations, the YOLOX detector can synthesize a more extensive data set, thereby improving the accuracy of the model. However, there are some limitations to the mosaic format which can cause issues in the search process. Nevertheless, the YOLOX team has solved this issue by resizing the mosaic component and minimizing the problems it can cause in the detection process.

No Anchor Box System

The anchor box system has been a widely adopted detection operation technique that helps many detectors achieve high accuracy in object detection. Nevertheless, the process of searching through multiple regions can be computationally expensive. YOLOX has eliminated the anchor box system as the fixed pattern reference and instead pinpointed zero points corresponding to different scales without adding prior scaling information. By doing so, it has made YOLOX more accurate and efficient.

SimOTA for Label Assignment

SimOTA has replaced the previous model in YOLOX, and it improves the label assignment of each object. The model uses the optimal transport theory to solve the one-to-many or many-to-many problems. Moreover, it uses the top-k strategy to enhance the accuracy of the detector while reducing the false detection rate. Additionally, SimOTA provides functionality to track multiple objects such that the results are more precise and more accurate.

YOLOX is a modification of YOLOv3 that has significantly improved its performance. The modifications made to the detector have improved its processing time while also improving its accuracy. The integration of mixup and mosaic augmentation techniques alongside the anchor-free mechanism, and SimOTA have improved the YOLOX model, surpassing many previous detectors in the object detection space. The contributions of the YOLOX team have paved the way for improvement in object detection while advancing a more efficient and accurate system.