Object detection is a computer vision technology that localizes and identifies objects in an image. Due to object detection's versatility in application, object detection has emerged in the last few years as the most commonly used computer vision technology.
In this article, we will walk through the following material to give you an idea of what object detection is and how you can start using it for your own use case:
- How Object Detection Works
- Object Detection Use Cases
- Labeling Object Detection Data
- Data Augmentation for Object Detection
- Modeling Object Detection Problems
Let's dive in!
How Object Detection Works
Object detection is often called object recognition or object identification, and these concepts are synonymous.
Object detection is not, however, akin to other common computer vision technologies such as classification (assigns a single class to an image), keypoint detection (identifies points of interest in an image), or semantic segmentation (separates the image into regions via masks).
The Object Detection Task
The object detection task localizes objects in an image and labels these objects as belonging to a target class.
Object detection models accomplish this goal by predicting X1, X2, Y1, Y2 coordinates and Object Class labels. Using object detection in an application simply involves inputing an image (or video frame) into an object detection model and receiving a JSON output with predicted coordinates and class labels.
Modeling Object Detection
In order to make these predictions, object detection models form features from the input image pixels.
After formation, image pixel features are fed through a deep learning network
and coordinate and class predictions are made as offsets from a series of anchor boxes.
The object detection model learns from the data that it is shown. So in order to train an object detection model to detect your objects of interest, it is important to collect a labeled dataset.
Object Detection Use Cases
Object detection is useful in any setting where computer vision is needed to localize and identify objects in an image. Object detection flourishes in settings where objects and scenery are more or less similar.
At Roboflow, we have seen use cases for object detection all over the map of industries. Here are just a few examples:
In general, object detection use cases can be clustered into the following groups:
- Aerial and Geospatial Imagery
- Manufacturing Quality Assurance
- Anomaly Detection
- Safety and Surveillance
- Object Counting
- Self Driving Cars
Labeling Object Detection Data
In order to train an object detection model, you must show the model a corpus of labeled data that has your objects of interests labeled with bounding boxes.
Annotating images can be accomplished manually or via services. To get started, you may need to label as few as 10-50 images to get your model off the ground. Going forward, however, more labeled data will always improve your models performance and generalizability.
Labeling Your Images Yourself
If you choose to label images yourself, there are a number of free, open source labeling solutions that you can leverage.
Here are some guides for getting started:
- Getting Started with CVAT - Annotation for Computer Vision
- Getting Started with VoTT Annotation Tool for Computer Vision
- Getting Started with LabelImg for Labeling Object Detection Data
I recommend CVAT - CVAT has a web interface so no program installs are necessary and you will quickly be in the platform and labeling images.
Labeling services leverage crowd workers to label your dataset for you. If you have a very large labeling job, these solutions may be for you.
Some automatic labeling services include:
Labeling Best Practices
As you are gathering your dataset, it is important to think ahead to problems that your model may be facing in the future.
- Make sure to include plenty of examples of every type of object that you would like to detect.
- Simplify the object detection task by limiting the variation of environment in your dataset.
- Label a tight box around the object of interest.
- Label occluded objects as if the object was fully visible.
- Label objects that are partially cutoff on the edge of the image.
Data Augmentation for Object Detection
Data augmentation involves generating derivative images from your base training dataset.
This means that you can spend less time labeling and more time using and improving your object detection model.
Data Augmentation strategies include, but are not limited to the following:
- Flip Augmentation
- Blur Augmentation
- Random Crop Augmentation
- Random Rotate Augmentation
- Mosaic Data Augmentation
Want to dive in deeper? See this post:
Object Detection Models
Once you have a labeled dataset, and you have made your augmentations, it is time to start training an object detection model.
Training involves showing instances of your labeled data to a model in batches and iteratively improving the way the model is mapping images to predictions.
As with labeling, you can take two approaches to training and inferring with object detection models - train and deploy yourself, or use training and inference services.
Train Your Own Object Detection Model
At Roboflow, we are proud hosts of the Roboflow Model Library. Within the model library, you will see documentation and code on how to train and deploy your custom model with various model architectures.
We have also published a series of best in class getting started tutorials on how to train your own custom object detection model including. As of September 2020, the best object detection models are:
- How to Train YOLOv5
- How to Train YOLOv4
- How to Train YOLOv3
- How to Train Detectron2
- How to Train EfficientDet
I recommend training YOLOv5 to start as it is the easiest to start with off the shelf.
Training your own model is a good way to get hands on with the object detection prediction engine.
However, you may wish to move more quickly or you may find that the myriad of different techniques and frameworks involved in modeling and deploying your model are worth outsourcing.
AutoML Object Detection Training and Inference Services
Due to the complexity involved in constructing and deploying an object detection model, an application developer may choose to outsource this portion of the object detection process to an AutoML (Automatic Machine Learning) solution.
At Roboflow we spent some time benchmarking common AutoML solutions on the object detection task:
We also have been developing an automatic training and inference solution at Roboflow:
With any of these services, you will input your training images and one-click
Train. After training completes, an the service will standup an endpoint where you can send in your image and receive predictions.
Object Detection Models on the Edge
It is becoming increasingly important in many use cases to make object detection in realtime (e.g. at greater than 30FPS).
A number of hardware solutions have popped up around the need to run object detection models on the edge including:
We have also published some guides on deploying your custom object detection model to the edge including:
Object Detection is a powerful, cutting edge computer vision technology that localizes and identifies objects in an image.
In this article, we have covered the gamut of object detection tools and technologies from labeling images, to augmenting images, to training object models, to deploy object detection models for inference.
We hope you enjoyed - and as always, happy detecting! And we'll be continually updating this post as new models and techniques become available.