Microsoft COCO Classes Reference List
Published Oct 15, 2024 • 1 min read

The Microsoft COCO dataset is commonly used to benchmark and evaluate computer vision model architectures. It is also commonly used to train "base weights" that you can fine-tune using custom data using transfer learning. The most recent COCO challenge in 2020 included data for object detection, keypoint detection, panoptic segmentation, and dense pose estimation.

You can try a YOLO11 model trained on Microsoft COCO in the following playground. Drag in an image that contains an object that Microsoft COCO can identify to see how the model performs:

For object detection, Microsoft COCO can reference 80 classes. These are, in alphabetical order:

airplane
apple
backpack
banana
baseball hat
baseball glove
bear
bed
bench
bicycle
bird
boat
book
bottle
bowl
broccoli
bus
cake
car
carrot
cat
cell phone
chair clock
couch
cow
cup
dining table
dog
donut
elephant
fire hydrant
fork
frisbee
giraffe
hair drier
handbag
horse
hot dog
keyboard
kite
knife
laptop
microwave
motorcycle
mouse
orange
oven
parking meter
person
pizza
potted plant
refrigerator
remote
sandwich
scissors
sheep
sink
skateboard
skis
snowboard
spoon
sports ball
stop sign
suitcase
surfboard
teddy bear
tennis racket
tie
toaster
toilet
toothbrush
traffic light
train
truch
tv
umbrella
vase
wine glass

To learn more about Microsoft COCO, refer to our Introduction to the COCO dataset

You can also try models trained on the Microsoft COCO dataset using various architectures on the Roboflow Univese Microsoft COCO playground.

Cite this Post

Use the following entry to cite this post in your research:

James Gallagher. (Oct 15, 2024). Microsoft COCO Classes Reference List. Roboflow Blog: https://blog.roboflow.com/microsoft-coco-classes/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Written by

James Gallagher
James is a technical writer at Roboflow, with experience writing documentation on how to train and use state-of-the-art computer vision models.