Detecting small objects is one of the most challenging and important problems in computer vision. In this post, we will discuss some of the strategies we have developed at Roboflow by iterating on hundreds of small object detection models.

Small objects as seen from above by drone in the public aerial maritime dataset

To improve your model's performance on small objects, we recommend the following techniques:

If you prefer a video instead.

Why is the Small Object Problem Hard?

The small object problem plagues object detection models worldwide. Not buying it? Check the COCO evaluation results for recent state of the art models YOLOv3, EfficientDet, and YOLOv4:

Check out AP_S, AP_M, AP_L for state of the art models. Small objects are hard! (cite)

In EfficientDet for example, mean average precision (mAP) on small objects is only 12%, held up against an AP of 51% for large objects. That is almost a five fold difference!

So why is detecting small objects so hard?

It all comes down to the model. Object detection models form features by aggregating pixels in convolutional layers.

Feature aggregation for object detection in PP-YOLO

And at the end of the network a prediction is made based on a loss function, which sums up across pixels based on the difference between prediction and ground truth.

The loss function in YOLO

If the ground truth box is not large, the signal will small while training is occurring.

Furthermore, small objects are most likely to have data labeling errors, where their identification may be omitted.

Empirically and theoretically, small objects are hard.

Increasing your image capture resolution

Resolution, resolution, resolution... it is all about resolution.

Very small objects may contain only a few pixels within the bounding box - meaning it is very important to increase the resolution of your images to increase the richness of features that your detector can form from that small box.

Therefore, we suggest capturing as high of resolution images as possible, if possible.

Increasing your model's input resolution

Once you have your images at higher resolution, you can scale up your model's input resolution. Warning: this will result in a large model that takes longer to train, and will be slower to infer when you start deployment. You may have to run experiments to find out the right tradeoff of speed with performance.

You can easily scale your input resolution in our tutorial on training YOLOv4 by changing image size in the config file.

[net]
batch=64
subdivisions=36
width={YOUR RESOLUTION WIDTH HERE}
height={YOUR RESOLUTION HEIGHT HERE}
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue = .1

learning_rate=0.001
burn_in=1000
max_batches=6000
policy=steps
steps=4800.0,5400.0
scales=.1,.1

You can also easily scale your input resolution in our tutorial on how to train YOLOv5 by changing the image size parameter in the training command:

!python train.py --img {YOUR RESOLUTON SIZE HERE} --batch 16 --epochs 10 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results  --cache

Note: you will only see improved results up to the maximum resolution of your training data.

Tiling your images

Another great tactic for detecting small images is to tile your images as a preprocessing step. Tiling effectively zooms your detector in on small objects, but allows you to keep the small input resolution you need in order to be able to run fast inference.

Tiling images as a preprocessing step in Roboflow Pro

If you use tiling during training, it is important to remember that you will also need to tile your images at inference time.

Generating More Data Via Augmentation

Data augmentation generates new images from your base dataset. This can be very useful to prevent your model from overfitting to the training set.

Some especially useful augmentations for small object detection include random crop, random rotation, and mosaic augmentation.

Auto Learning Model Anchors

Anchor boxes are prototypical bounding boxes that your model learns to predict in relation to. That said, anchor boxes can be preset and sometime suboptimal for your training data. It is good to custom tune these to your task at hand. Thankfully, the YOLOv5 model architecture does this for you automatically based on your custom data. All you have to do is kick off training.

Analyzing anchors... anchors/target = 4.66, Best Possible Recall (BPR) = 0.9675. Attempting to generate improved anchors, please wait...
WARNING: Extremely small objects found. 35 of 1664 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 1664 points...
thr=0.25: 0.9477 best possible recall, 4.95 anchors past thr
n=9, img_size=416, metric_all=0.317/0.665-mean/best, past_thr=0.465-mean: 18,24,  65,37,  35,68,  46,135,  152,54,  99,109,  66,218,  220,128,  169,228
Evolving anchors with Genetic Algorithm: fitness = 0.6825: 100%|██████████| 1000/1000 [00:00<00:00, 1081.71it/s]
thr=0.25: 0.9627 best possible recall, 5.32 anchors past thr
n=9, img_size=416, metric_all=0.338/0.688-mean/best, past_thr=0.476-mean: 13,20,  41,32,  26,55,  46,72,  122,57,  86,102,  58,152,  161,120,  165,204

Filtering Out Extraneous Classes

Class management is an important technique to improve the quality of your dataset. If you have one class that is significantly overlapping with another class, you should filter this class from your dataset. And perhaps, you decide that the small object in your dataset is not worth detecting, so you may want to take it out. You can quickly identify all of these issues with the Advanced Dataset Health Check that is a part of Roboflow Pro.

Class omission and class renaming are all possible through Roboflow's ontology management tools.

Conclusion

Properly detecting small objects is truly a challenge. In this post, we have discussed a few strategies for improving your small object detector, namely:

As always, happy detecting!