Balanced Feature Pyramid

The Balanced Feature Pyramid (BFP) is a feature pyramid module used for object detection. Unlike other approaches like FPNs that integrate multi-level features using lateral connections, the BFP strengthens the features using the same deeply integrated balanced semantic features. This results in improved information flow and better object detection results.

How the BFP Works

The BFP pipeline consists of four steps: rescaling, integrating, refining, and strengthening. The features at resolution level $l$ are denoted as $C\_{l}$. The number of multi-level features is denoted as $L$. The indexes of involved lowest and highest levels are denoted as $l\_{min}$ and $l\_{max}$. To integrate multi-level features and preserve their semantic hierarchy, the multi-level features {$C\_{2}, C\_{3}, C\_{4}, C\_{5}$} are first resized to the same size as $C\_{4}$ with interpolation and max-pooling respectively. The balanced semantic features are then obtained by averaging as:

$$ C = \frac{1}{L}\sum^{l\_{max}}\_{l=l\_{min}}C\_{l} $$

The balanced semantic features are then rescaled using the same but reverse procedure, strengthening the original features. The authors note that this procedure does not contain any parameters and helps each resolution obtain equal information from others, proving the effectiveness of the information flow.

The balanced semantic features can be further refined to be more discriminative. The authors found that refinements with convolutions directly and non-local modules work well, but the non-local module works in a more stable way. Therefore, embedded Gaussian non-local attention is utilized as the default refinement method in the BFP. The refining step enhances the integrated features and improves the object detection results further.

Advantages of the BFP

With the BFP method, features from low-level to high-level are aggregated at the same time, resulting in improved object detection results. The outputs are {$P\_{2}, P\_{3}, P\_{4}, P\_{5}$} and can be used for object detection following the same pipeline in FPN. The BFP also does not contain any parameters, making it scalable and easy to implement. The BFP method strengthens the multi-level features using the same deeply integrated balanced semantic features, resulting in improved information flow and better object detection results.

Incorporating the BFP method into object detection algorithms can lead to more accurate and efficient object detection. Its novel approach to improving information flow and aggregating features of different resolutions makes it a promising area of research.