DIY labeling with CVAT

CVAT is an OpenCV project that provides easy labeling for computer vision datasets. CVAT allows you to utilize an easy to use interface to make annotating easier. CVAT is an open labeler, a free open source labeling tool, a free annotator, an image annotator, and of course a Computer Vision Annotation Tool.

In this post, we will be focusing on CVAT's ability to make object detection annotations on images, although, it has many more capabilities including, CVAT annotation tool for video, CVAT annotation tool for semantic segmentation, CVAT for polygon annotations, and so on.

CVAT labeled image for computer vision

CVAT is an annotation tool among a group of similar DIY labeling tools including LabelImg computer vision labeling tool. We recommend trying to label a batch of images yourself (50+) and training a state of the art model like YOLOv4, to see if your computer vision task is already solved with current technologies.

Label and Annotate Data with Roboflow for free

Use Roboflow to manage datasets, label data, and convert to 26+ formats for using different models. Roboflow is free up to 10,000 images, cloud-based, and easy for teams.

I will be showing the steps that I used to annotate the public aerial maritime object detection dataset taken from a drone. Although a specific dataset is used, this post is meant to be a general guide on how to label an object detection dataset and how to use labeling tools for object detection. Feel free to another similar aerial imagery dataset.

CVAT Object Detection Video Tutorial. Subscribe to our YouTube for more!

CVAT Quickstart

If this is the first time you have encountered CVAT, then you want to start by launching the CVAT website, which is the quickest way to start labeling your data.

Once into the CVAT website, you will see a page like this:

CVAT Master Task Page

Launch New CVAT Task

From there, you can launch a new task in CVAT and drag your images in for labeling. You are also prompted to specify the class labels of the objects that you would like to detect. Carefully specify these.

Once your data is uploaded, navigate back to tasks. From there, you will see a task page.

CVAT Task Page

Enter CVAT Labeling Job

You can create jobs to annotate this dataset and you will have automatically set up the CVAT labeling job when you created the task. Note the task/job semantic hierarchy.

Now you can click into your labeling task and get to work!

When you're in the labeling screen you will see the following.

Photo of an image in my labeling task at

Drawing Annotations in CVAT

You can click "Create Shape" and draw a box around the object you want your detector to detect. Then on the right hand side, you will see the color of the box that you just drew. You can choose among the class labels that you provided when setting up the task.

Exporting Annotations From CVAT

You first want to click "Save". CVAT does not automatically save work.

Then click "Menu", in CVAT you will see the following options:

Menu from CVAT on my labeling task

Then you want to click "Export task dataset" and you can choose among different formats: label VOC XML, label COCO JSON, label YOLO annotations, etc.

Congrats! Now you have a labeled dataset.

CVAT on Local for Serious CVAT

If you are serious about CVAT, you can configure it on local. The CVAT website has these limitations:

  • No more than 10 tasks per user
  • Uploaded data is limited to 500Mb

On local you will not be subject to these limitations because your machine will be doing the heavy lifting.

To launch CVAT on local, first clone the CVAT repository in your terminal window.

git clone
cd cvat

Then, if you don't have Docker, install Docker. See that Docker is sucessfully installed:

docker version

Now we build CVAT on local and launch with

docker-compose build
docker-compose up -d

This will take a while to run. It is building CVAT dependencies in your local machine.

Then you create your username within your local CVAT service by executing into it:

docker exec -it cvat bash -ic 'python3 ~/ createsuperuser'

Now, navigate to your browser and type


This will navigate to your local CVAT!

You can come back later and restart the service. If you are having trouble logging into CVAT, you can rebuild with no-cache:

docker-compose build --no-cache 
docker-compose up -d

CVAT Labeling Tips, Tricks, Best Practices

When you're operating in CVAT, carefully annotate objects with your downstream model in mind. Keep these labeling best practices in mind while working through your dataset:

1) Label entirely around the object

2) For occluded objects - label them entirely

3) Generally label objects that are partially out of frame

4) Beware of labeling many boxes that overlap or are entirely contained within each other. This can really confuse your model.

CVAT shortcuts:

  • Start your labels list with the most represented class - it will be the default when you draw a box
  • Label all objects in each class first - you can focus on them and change all of their labels at once
  • Type "N" to draw a new box

CVAT Alternatives

CVAT is just one of many computer vision labeling tools. If you're wondering if it's right for you, you may want to read our Ultimate Guide to Object Detection or try Roboflow Annotate, which is designed to simplify many of the rough edges open source tools like CVAT have.

Looking to Get Started with Annotating Data?

Roboflow provides easy annotation with smart auto-suggested defaults. It's no surprise users annotate faster with Roboflow.

Next Steps after Labeling Your Computer Vision Dataset in CVAT

Once your dataset is labeled in CVAT, it is time to move to the creation of your computer vision model!

Roboflow makes it easy to load in your data (just drag and drop your images and your annotation file from CVAT). You can generate even more data with augmentations such as flipping images for CV, random cropping, and creating synthetic computer vision data. If you are interested in using data augmentations to increase the number of your training images (to spend less time in CVAT), this is a good guide on using data augmentation in computer vision.

When you are ready, use Roboflow Train to train a model with one-click and quickly test your model using our web app or your webcam. Alternatively, you can export your data from Roboflow to any format and start training your computer vision model. Our posts on How to Train YOLOv4 and How to Train EfficientDet are good starting points to train your model and then from model evaluation, you can gauge how much more data you may need to collect and annotate.

Output of model inference finding bounding boxes for docks and lifts in aerial photo taken from a drone.
Inference after training only 74 aerial drone images