The Microsoft COCO dataset is the gold standard benchmark for evaluating the performance of state of the art computer vision models. Despite its wide use among the computer vision research community, the COCO dataset is less well known to general practitioners.

In this post, we will dive into the COCO dataset, explaining the motivation for the dataset and exploring dataset facts and metrics.

Why The COCO Dataset?

The COCO dataset stands for Common Objects in Context, and is designed to represent a vast array of objects that we regularly encounter in everyday life.

The Computer Vision Benchmark

The COCO dataset is labeled, providing data to train supervised computer vision models that are able to identify the common objects in the dataset. Of course, these models are still far from perfect, so the COCO dataset provides a benchmark for evaluating the periodic improvement of these models through computer vision research.

See this link for a snapshot of the current COCO Dataset leaderboard.

A Checkpoint for Transfer Learning

Another motivation for the COCO dataset is to provide a base dataset to train computer vision models. Once the model is trained on the COCO dataset, it can be fine-tuned to learn other tasks, with a custom dataset.

Here is a tutorial to get started with transfer learning from the COCO dataset.

COCO Dataset Facts and Metrics

COCO Dataset Tasks

The COCO dataset contains multiple computer vision tasks, listed here in descending order of commonality:

Cow and Giraffe objects in the COCO object detection dataset
  • Semantic Segmentation - The boundary of objects are labeled with a mask and object classes are labeled with a class label
COCO Panoptic Segmentation (cite)
  • Keypoint Detection - Humans are labeled with key points of interest (elbow, knee, etc.)
COCO Keypoint Detection (cite)

COCO Dataset Facts

Object Detection

  • The COCO Dataset has 121,408 images
  • The COCO Dataset has 883,331 object annotations
  • The COCO Dataset has 80 classes
  • The COCO Dataset median image ratio is 640 x 480

Semantic Segmentation

  • Panoptic Segmentation requires models to draw boundaries between objects in semantic segmentation

Keypoint Detection

  • 250,000 people with keypoints labeled

COCO Dataset Class List

Here is a list of the class labels in the COCO dataset.

COCO dataset validation set class list (Roboflow dataset health check)

In the COCO dataset class list, we can see that the COCO dataset is heavily biased towards major class categories - such as person, and lightly populated with minor class categories - such as toaster. Due to this fact, it is hard to train models on the COCO dataset to recognize classes that are under exposed. See here for more on balancing classes in object detection.

COCO Dataset Explorer

You can explore the COCO dataset by using the COCO dataset explorer. To use the explore simply visit, the COCO dataset explorer page

COCO explorer - no images contain both sheep and pizza!

Downloading the COCO Dataset

To download the COCO dataset you can visit the download link on the COCO dataset page.

Additionally, here is a python script to download the object detection portion of the COCO dataset to your local drive.

#!/bin/bash
# COCO 2017 dataset http://cocodataset.org
# Download command: bash data/scripts/get_coco.sh
# Train command: python train.py --data coco.yaml
# Default dataset location is next to /yolov5:
#   /parent_folder
#     /coco
#     /yolov5

# Download/unzip labels
d='../' # unzip directory
url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
f='coco2017labels.zip'                                                                 # 68 MB
echo 'Downloading' $url$f ' ...' && curl -L $url$f -o $f && unzip -q $f -d $d && rm $f # download, unzip, remove

# Download/unzip images
d='../coco/images' # unzip directory
url=http://images.cocodataset.org/zips/
f1='train2017.zip' # 19G, 118k images
f2='val2017.zip'   # 1G, 5k images
f3='test2017.zip'  # 7G, 41k images (optional)
for f in $f1 $f2; do
  echo 'Downloading' $url$f ' ...' && curl -L $url$f -o $f && unzip -q $f -d $d && rm $f # download, unzip, remove
done
(source)

The COCO Dataset Format

The COCO dataset comes down in a special format called COCO JSON.

See here to convert any annotation format to COCO JSON or to convert COCO JSON to any annotation format, such as Pascal VOC to COCO JSON.

Conclusion

We have discussed the details of the COCO dataset including COCO dataset motivations, COCO dataset facts, COCO dataset tasks, and COCO dataset metrics.

Now you should have a good sense of what the COCO dataset is and why so many people in computer vision use it!

As always, happy building.