The Microsoft COCO dataset is the gold standard benchmark for evaluating the performance of state of the art computer vision models. Despite its wide use among the computer vision research community, the COCO dataset is less well known to general practitioners.
In this post, we will dive into the COCO dataset, explaining the motivation for the dataset and exploring dataset facts and metrics.
Why The COCO Dataset?
The COCO dataset stands for Common Objects in Context, and is designed to represent a vast array of objects that we regularly encounter in everyday life.
The Computer Vision Benchmark
The COCO dataset is labeled, providing data to train supervised computer vision models that are able to identify the common objects in the dataset. Of course, these models are still far from perfect, so the COCO dataset provides a benchmark for evaluating the periodic improvement of these models through computer vision research.
See this link for a snapshot of the current COCO Dataset leaderboard.
A Checkpoint for Transfer Learning
Another motivation for the COCO dataset is to provide a base dataset to train computer vision models. Once the model is trained on the COCO dataset, it can be fine-tuned to learn other tasks, with a custom dataset.
Here is a tutorial to get started with transfer learning from the COCO dataset.
COCO Dataset Facts and Metrics
COCO Dataset Tasks
The COCO dataset contains multiple computer vision tasks, listed here in descending order of commonality:
- Object Detection - Objects are annotated with a bounding box and class label
- Semantic Segmentation - The boundary of objects are labeled with a mask and object classes are labeled with a class label
- Keypoint Detection - Humans are labeled with key points of interest (elbow, knee, etc.)
COCO Dataset Facts
- The COCO Dataset has 121,408 images
- The COCO Dataset has 883,331 object annotations
- The COCO Dataset has 80 classes
- The COCO Dataset median image ratio is 640 x 480
- Panoptic Segmentation requires models to draw boundaries between objects in semantic segmentation
- 250,000 people with keypoints labeled
COCO Dataset Class List
Here is a list of the class labels in the COCO dataset.
In the COCO dataset class list, we can see that the COCO dataset is heavily biased towards major class categories - such as person, and lightly populated with minor class categories - such as toaster. Due to this fact, it is hard to train models on the COCO dataset to recognize classes that are under exposed. See here for more on balancing classes in object detection.
COCO Dataset Explorer
You can explore the COCO dataset by using the COCO dataset explorer. To use the explore simply visit, the COCO dataset explorer page
Downloading the COCO Dataset
To download the COCO dataset you can visit the download link on the COCO dataset page.
Additionally, here is a python script to download the object detection portion of the COCO dataset to your local drive.
The COCO Dataset Format
The COCO dataset comes down in a special format called COCO JSON.
We have discussed the details of the COCO dataset including COCO dataset motivations, COCO dataset facts, COCO dataset tasks, and COCO dataset metrics.
Now you should have a good sense of what the COCO dataset is and why so many people in computer vision use it!
As always, happy building.