Grid Sensitive

When it comes to object detection in computer vision, Grid Sensitive is a technique introduced by YOLOv4 that helps make predictions more accurate. In the original version of YOLOv3, there was an issue predicting the centers of bounding boxes that were located on the boundary of a grid cell. This problem occurred because the coordinates of the bounding box centers could not be exactly equal to the coordinates of the grid cell.

What is YOLOv4 and Object Detection?

Before we dive deeper into Grid Sensitive, it's important to understand what YOLOv4 is and how it relates to object detection. YOLO (You Only Look Once) is a popular object detection algorithm that works by dividing an image into a grid and predicting bounding boxes and class probabilities for each grid cell. This approach differs from other object detection methods, such as R-CNN or Faster R-CNN, which use region proposal methods to identify regions in the image that may contain objects, and then perform classification and bounding box regression on those regions.

Object detection refers to the task of identifying objects in images or videos and placing bounding boxes around them. This is a challenging problem in computer vision that has many practical applications, such as self-driving cars, surveillance, and robotics.

The Problem with YOLOv3

Now, let's go back to the problem with YOLOv3. The equation used to decode the coordinates of the bounding box center was:

$$ \begin{aligned} &x=s \cdot\left(g\_{x}+\sigma\left(p\_{x}\right)\right) \\ &y=s \cdot\left(g\_{y}+\sigma\left(p\_{y}\right)\right) \end{aligned} $$

Here, $\sigma$ is the sigmoid function, $g\_{x}$ and $g\_{y}$ are integers that represent the coordinates of the grid cell, and $s$ is a scale factor. However, this equation has a limitation. The predicted coordinates $x$ and $y$ cannot be exactly equal to $s \cdot g\_{x}$ or $s \cdot\left(g\_{x}+1\right)$, which is where the grid boundary is located. As a result, the model had difficulty predicting the centers of bounding boxes that were located on the grid boundary.

This problem may seem minor, but in practice, it can have a significant impact on the accuracy of object detection. If the model cannot predict the centers of bounding boxes accurately, it may also have trouble predicting the correct size and shape of the bounding box, which can lead to false positives or false negatives.

Grid Sensitive to the Rescue

This is where Grid Sensitive comes in. The Grid Sensitive technique adjusts the equation for decoding the bounding box centers to:

$$ \begin{aligned} &x=s \cdot\left(g\_{x}+\alpha \cdot \sigma\left(p\_{x}\right)-(\alpha-1) / 2\right) \\ &y=s \cdot\left(g\_{y}+\alpha \cdot \sigma\left(p\_{y}\right)-(\alpha-1) / 2\right) \end{aligned} $$

Here, $\alpha$ is a hyperparameter that controls how sensitive the model is to the grid boundary. By default, $\alpha$ is set to 1, which is equivalent to the original YOLOv3 equation. However, by increasing $\alpha$, the model becomes more sensitive to the grid boundary, making it easier to predict the centers of bounding boxes that are located on the boundary. The term $-(\alpha-1) / 2$ ensures that the predicted coordinates are still centered on the grid cell, even when $\alpha$ is greater than 1.

One advantage of the Grid Sensitive technique is that it adds very few FLOPs (floating-point operations) to the model, making it computationally efficient. According to the YOLOv4 paper, the additional FLOPs added by Grid Sensitive can be ignored entirely.

Grid Sensitive is an effective technique for improving the accuracy of object detection in computer vision, particularly when dealing with grid boundaries. By making the model more sensitive to the grid boundary, it can predict the centers of bounding boxes more accurately, which can lead to better overall performance. This technique is easy to implement and does not add significant computational overhead, making it a practical solution for many real-world applications.