Object detection is a computer vision technology that localizes and identifies objects in an image. Due to object detection's versatility, object detection has emerged in the last few years as the most commonly used computer vision technology.
In this article, we will walk through the following material to give you an idea of what object detection is and how you can start using it for your own use case:
- How object detection works
- Where object detection is used
- How to label data for object detection models
- Data augmentation best practices for object detection
- How to deploy an object detection model
Let's dive in!
What is object detection?
Object detection is a computer vision solution that identifies instances of objects in visual media. Object detection programs draw a bounding box around an instance of a detected object, paired with a label to represent the contents of the box. For example, a person in an image might be labelled "person" and a car might be labelled "vehicle".
In the following video, we discuss what object detection is in one minute:
How object detection works
Object detection is often called object recognition, object identification, image detection, and these concepts are synonymous.
Object detection is not, however, akin to other common computer vision technologies such as classification (assigns a single class to an image), keypoint detection (identifies points of interest in an image), or semantic segmentation (separates the image into regions via masks).
If you're interested in the other definitions of common computer vision terms we'll be using, see our Computer Vision Glossary.
Object detection programs localize objects in an image and labels these objects as belonging to a target class.

Object detection models accomplish this goal by predicting X1, X2, Y1, Y2 coordinates and Object Class labels. Using object detection in an application simply involves inputing an image (or video frame) into an object detection model and receiving a JSON output with predicted coordinates and class labels.
Modeling object detection
In order to make these predictions, object detection models form features from the input image pixels.

After formation, image pixel features are fed through a deep learning network

and coordinate and class predictions are made as offsets from a series of anchor boxes.

The object detection model learns from the data that it is shown. So in order to train an object detection model to detect your objects of interest, it is important to collect a labeled dataset.
Object detection use cases
Object detection is useful in any setting where computer vision is needed to localize and identify objects in an image. Object detection flourishes in settings where objects and scenery are more or less similar.
At Roboflow, we have seen use cases for object detection all over the map of industries. Here are just a few examples:

Let's talk through a specific use case. Consider a situation where you want to monitor that all of the keys for security doors have been returned by the end of the day. With help from computer vision, this problem could be easily solved. A camera could be pointed at the place where keys are stored. This camera could use a computer vision model to count the number of keys present at the end of the day.
If there are fewer keys than expected, a manager can be notified so that the situation can be addressed. This is just one of the many situations in which object detection can be helpful.
In general, object detection use cases can be clustered into the following groups:
- Aerial and Geospatial Imagery (eg for Agriculture)
- Drones
- Manufacturing Quality Assurance
- Anomaly Detection
- Safety and Surveillance
- Medical Imaging
- Object Counting
- Self Driving Cars
- Retail
- Ecommerce
- Supply Chain
- Finance
For more inspiration and examples, see our computer vision project showcase.
How to label object detection data
In order to train an object detection model, you must show the model a corpus of labeled data that has your objects of interests labeled with bounding boxes.
Annotating images can be accomplished manually or via services. To get started, you may need to label as few as 10-50 images to get your model off the ground. Going forward, however, more labeled data will always improve your models performance and generalizability.
Labeling images for an object detection model
If you choose to label images yourself, there are a number of free, open source labeling solutions that you can leverage.
Here are some guides for getting started:
- Getting Started with CVAT Tutorial
- Getting Started with LabelImg Tutorial
- Getting Started with VGG Image Annotator (VIA) Tutorial
- Getting Started with LabelMe Tutorial
- Getting Started with VoTT Tutorial
And yours truly:
We recommend CVAT or Roboflow Annotate because they are powerful tools that have a web interface so no program installs are necessary and you will quickly be in the platform and labeling images.
Labeling services
Labeling services leverage crowd workers to label your dataset for you. If you have a very large labeling job, these solutions may be for you.
Some automatic labeling services include:
Labeling best practices for object detection
As you are gathering your dataset, it is important to think ahead to problems that your model may be facing in the future.
- Make sure to include plenty of examples of every type of object that you would like to detect.
- Simplify the object detection task by limiting the variation of environment in your dataset.
- Label a tight box around the object of interest.
- Label occluded objects as if the object was fully visible.
- Label objects that are partially cutoff on the edge of the image.
- Think about your ontology structure before you get started and make sure all your labelers are on the same page.
Data augmentation for object detection
Data augmentation involves generating derivative images from your base training dataset.

This means that you can spend less time labeling and more time using and improving your object detection model.
- Getting Started with Data Augmentation for Object Detection
- Quantifying the Impact of Data Augmentation
Data Augmentation strategies include, but are not limited to the following:
- Flip Augmentation
- Blur Augmentation
- Random Crop Augmentation
- Random Rotate Augmentation
- Mosaic Data Augmentation
Want to dive in deeper? See this post:
What models are used for object detection?
There are a wide range of open-source object detection models available. A popular choice is models in the YOLO (You Only Look Once) family, which continue to represent the state-of-the-art in obejct detection tasks.
Once you have a labeled dataset, and you have made your augmentations, it is time to start training an object detection model.
Training involves showing instances of your labeled data to a model in batches and iteratively improving the way the model is mapping images to predictions.
As with labeling, you can take two approaches to training and inferring with object detection models - train and deploy yourself, or use training and inference services like Roboflow Train and Roboflow Deploy. Both of which are free for Public plans.
Train your own object detection model
At Roboflow, we are proud hosts of the Roboflow Model Library. Within the model library, you will see documentation and code on how to train and deploy your custom model with various model architectures.
We have also published a series of best in class getting started tutorials on how to train your own custom object detection model including. As of August 2022, some of the best object detection models are:
- How to Train YOLOv7 Tutorial
- How to Train YOLOv6 Tutorial
- How to Train YOLOv5 Tutorial
- How to Train YOLOv4 Tutorial
- How to Train YOLOv3 Tutorial
- How to Train Detectron2 Tutorial
- How to Train EfficientDet Tutorial
I recommend training YOLO v5 to start as it is the easiest to start with off the shelf.
If you're deploying to Apple devices like the iPhone or iPad, you may want to give their no-code training tool, CreateML, a try or use the Roboflow mobile SDK.
Training your own model is a good way to get hands on with the object detection prediction engine.
However, you may find that the model training and deployment process is worth outsourcing.
AutoML object detection training and inference services
Due to the complexity involved in constructing and deploying an object detection model, an application developer may choose to outsource this portion of the object detection process to an AutoML (Automatic Machine Learning) solution.
At Roboflow we spent some time benchmarking common AutoML solutions on the object detection task:
Including
- AWS Rekognition Custom Labels
- GCP AutoML Vision
- Azure Custom Vision
We also have been developing an automatic training and inference solution at Roboflow:
With any of these services, you will input your training images and one-click Train
. After training completes, the service will standup an endpoint where you can send in your image and receive predictions.
Object detection models on the edge
It is becoming increasingly important in many use cases to make object detection in realtime (e.g. at greater than 30FPS).
A number of hardware solutions have popped up around the need to run object detection models on the edge including:
We have also published some guides on deploying your custom object detection model to the edge including:
- Deploy Computer Vision to Webcam
- One-Click Deploy Model to OAK
- One-Click Deploy to NVIDIA Jetson
- Deploy a Custom Model to the Luxonis OAK-1
- Deploy a Custom Model (with depth) to the Luxonis OAK-D
- Deploy YOLOv5 to Jetson Xavier NX at 30FPS
- Deploy YOLOv7 to a Jetson Nano
Set up a computer vision workflow
It's important to setup a computer vision pipeline that your team can use to standardize your computer vision workflow so you're not reinventing the wheel writing one-off Python scripts for things like converting annotation formats, analyzing dataset quality, preprocessing images, versioning, and distributing your datasets.
Luckily, Roboflow is a computer vision dataset management platform that productionizes all of these things for you so that you can focus on the unique challenges specific to your data, domain, and model.
It's free to get started with our cloud based computer vision workflow tool.
Conclusion
Object Detection is a powerful, cutting edge computer vision technology that localizes and identifies objects in an image.
In this article, we have covered the gamut of object detection tools and technologies from labeling images, to augmenting images, to training object models, to deploy object detection models for inference.
We hope you enjoyed - and as always, happy detecting! And we'll be continually updating this post as new models and techniques become available.
If you're interested in more of this type of content, be sure to subscribe to our YouTube channel for computer vision videos and tutorials.
Frequently Asked Questions
How does object detection compare to instance segmentation?
Object detection algorithms draw bounding boxes on an image to indicate the location of an object, whereas instance segmentation algorithms draw exact boundaries to identify objects. Instance segmentation is more useful when you need a precise boundary around an object.
What model architecture is most used for object detection?
Convolutional Neural Networks (CNNs) are commonly used for instance segmentation. YOLO, Resnet, and EfficientNet, among the most common object detection models, all use a CNN structure.