Object detection is a computer vision technology that localizes and identifies objects in an image. Due to object detection's versatility in application, object detection has emerged in the last few years as the most commonly used computer vision technology.

In this article, we will walk through the following material to give you an idea of what object detection is and how you can start using it for your own use case:

  • How Object Detection Works
  • Object Detection Use Cases
  • Labeling Object Detection Data
  • Data Augmentation for Object Detection
  • Modeling Object Detection Problems

Let's dive in!

How Object Detection Works

Common Misnomers

Object detection is often called object recognition or object identification, and these concepts are synonymous.

Object detection is not, however, akin to other common computer vision technologies such as classification (assigns a single class to an image), keypoint detection (identifies points of interest in an image), or semantic segmentation (separates the image into regions via masks).

If you're interested in the other definitions of common computer vision terms we'll be using, see our Computer Vision Glossary.

The Object Detection Task  

The object detection task localizes objects in an image and labels these objects as belonging to a target class.

Graphical depiction of the object detection task

Object detection models accomplish this goal by predicting X1, X2, Y1, Y2 coordinates and Object Class labels. Using object detection in an application simply involves inputing an image (or video frame) into an object detection model and receiving a JSON output with predicted coordinates and class labels.

Modeling Object Detection

In order to make these predictions, object detection models form features from the input image pixels.

Forming features from image pixels (source)

After formation, image pixel features are fed through a deep learning network

A diagram of an object detection model (source)

and coordinate and class predictions are made as offsets from a series of anchor boxes.  

Object detection predictions are made based off anchor boxes

The object detection model learns from the data that it is shown. So in order to train an object detection model to detect your objects of interest, it is important to collect a labeled dataset.

Object Detection Use Cases

Object detection is useful in any setting where computer vision is needed to localize and identify objects in an image. Object detection flourishes in settings where objects and scenery are more or less similar.

At Roboflow, we have seen use cases for object detection all over the map of industries. Here are just a few examples:

Example use cases for object detection

In general, object detection use cases can be clustered into the following groups:

  • Aerial and Geospatial Imagery
  • Manufacturing Quality Assurance
  • Anomaly Detection
  • Safety and Surveillance
  • Object Counting
  • Self Driving Cars

Labeling Object Detection Data

In order to train an object detection model, you must show the model a corpus of labeled data that has your objects of interests labeled with bounding boxes.

Annotating images for object detection in CVAT

Annotating images can be accomplished manually or via services. To get started, you may need to label as few as 10-50 images to get your model off the ground. Going forward, however, more labeled data will always improve your models performance and generalizability.

Labeling Your Images Yourself

If you choose to label images yourself, there are a number of free, open source labeling solutions that you can leverage.

Here are some guides for getting started:

I recommend CVAT because it is a powerful tool that has a web interface so no program installs are necessary and you will quickly be in the platform and labeling images.

Labeling Services

Labeling services leverage crowd workers to label your dataset for you. If you have a very large labeling job, these solutions may be for you.

Some automatic labeling services include:

Labeling Best Practices

As you are gathering your dataset, it is important to think ahead to problems that your model may be facing in the future.

  • Make sure to include plenty of examples of every type of object that you would like to detect.
  • Simplify the object detection task by limiting the variation of environment in your dataset.
  • Label a tight box around the object of interest.
  • Label occluded objects as if the object was fully visible.
  • Label objects that are partially cutoff on the edge of the image.

Data Augmentation for Object Detection

Data augmentation involves generating derivative images from your base training dataset.

Generating more data for object detection via data augmentation

This means that you can spend less time labeling and more time using and improving your object detection model.

Data Augmentation strategies include, but are not limited to the following:

Want to dive in deeper? See this post:

Object Detection Models

Once you have a labeled dataset, and you have made your augmentations, it is time to start training an object detection model.

Training involves showing instances of your labeled data to a model in batches and iteratively improving the way the model is mapping images to predictions.

As with labeling, you can take two approaches to training and inferring with object detection models - train and deploy yourself, or use training and inference services.

Train Your Own Object Detection Model

At Roboflow, we are proud hosts of the Roboflow Model Library. Within the model library, you will see documentation and code on how to train and deploy your custom model with various model architectures.

We have also published a series of best in class getting started tutorials on how to train your own custom object detection model including. As of September 2020, the best object detection models are:

I recommend training YOLO v5 to start as it is the easiest to start with off the shelf.

Training your own model is a good way to get hands on with the object detection prediction engine.

However, you may wish to move more quickly or you may find that the myriad of different techniques and frameworks involved in modeling and deploying your model are worth outsourcing.

AutoML Object Detection Training and Inference Services

Due to the complexity involved in constructing and deploying an object detection model, an application developer may choose to outsource this portion of the object detection process to an AutoML (Automatic Machine Learning) solution.

At Roboflow we spent some time benchmarking common AutoML solutions on the object detection task:

Including

We also have been developing an automatic training and inference solution at Roboflow:

With any of these services, you will input your training images and one-click Train. After training completes, the service will standup an endpoint where you can send in your image and receive predictions.  

Object Detection Models on the Edge

It is becoming increasingly important in many use cases to make object detection in realtime (e.g. at greater than 30FPS).

A number of hardware solutions have popped up around the need to run object detection models on the edge including:

We have also published some guides on deploying your custom object detection model to the edge including:

Object detection inference on video feed

Computer Vision Workflow

It's important to setup a computer vision pipeline that your team can use to standardize your computer vision workflow so you're not reinventing the wheel writing one-off Python scripts for things like converting annotation formats, analyzing dataset quality, preprocessing images, versioning, and distributing your datasets.

Luckily, Roboflow is a computer vision dataset management platform that productionizes all of these things for you so that you can focus on the unique challenges specific to your data, domain, and model.

It's free to get started with our cloud based computer vision workflow tool.

Conclusion

Object Detection is a powerful, cutting edge computer vision technology that localizes and identifies objects in an image.

In this article, we have covered the gamut of object detection tools and technologies from labeling images, to augmenting images, to training object models, to deploy object detection models for inference.

We hope you enjoyed - and as always, happy detecting! And we'll be continually updating this post as new models and techniques become available.