DIY labeling with CVAT

CVAT is an OpenCV project that provides easy labeling for computer vision datasets. CVAT allows you to utilize an easy to use interface to make computer vision annotating easier.

CVAT is an open labeler, a free open source labeling tool, a free annotator, an image annotator, and of course a Computer Vision Annotation Tool. Common uses cases for computer vision which CVAT labeling supports are: image classification, object detection, object tracking, image segmentation, and pose estimation.

In this post, we will be focusing on CVAT's ability to make object detection annotations on images, although, it has many more capabilities including, CVAT annotation tool for video, CVAT annotation tool for semantic segmentation, CVAT for polygon annotations, and so on.

For an extremely detailed guide of every element in the interface, refer to CVAT's documentation.

CVAT Object Detection Video Tutorial. Subscribe to our YouTube for more!

CVAT is an annotation tool among a group of similar DIY labeling tools including LabelImg computer vision labeling tool.

We recommend trying to label a batch of images yourself (50+) and training a state of the art model like YOLOv4, to see if your computer vision task is already solved with current technologies.

Label and Annotate Data with Roboflow for free

Use Roboflow to manage datasets, label data, and convert to 26+ formats for using different models. Roboflow is free up to 10,000 images, cloud-based, and easy for teams.

We will be showing the steps used to annotate the public aerial maritime object detection dataset taken from a drone. Although a specific dataset is used, this post is meant to be a general guide on how to label an object detection dataset and how to use labeling tools for object detection. Feel free to another similar aerial imagery dataset.

0:00
/
CVAT labeled image for computer vision

CVAT Quickstart

If this is the first time you have encountered CVAT, then you want to start by launching the CVAT website, which is the quickest way to start labeling your data.

Once into the CVAT website, you will see a page like this:

CVAT Master Task Page

Launch New CVAT Task

From there, you can launch a new task in CVAT and drag your images in for labeling. You are prompted to specify the class labels of the objects that you would like to detect. Carefully specify these because you'll want to ensure you have all the needed classes before you start labeling.

Once your data is uploaded, navigate back to tasks. From there, you will see a task page.

CVAT Task Page

Enter CVAT Labeling Job

You can create jobs to annotate this dataset and you will have automatically set up the CVAT labeling job when you created the task. Note the task/job semantic hierarchy.

Now you can click into your labeling task and get to work. When you're in the labeling screen you will see the following.

Photo of an image in my labeling task at https://cvat.org/

Drawing Annotations in CVAT

CVAT offers multiple types of shapes for annotation: rectangle (bounding box), polygon, polyline, points, ellipse, cuboid, and tag. Below are examples of how those tools can be used.

You can click "Create Shape" and draw a box around the object you want your detector to detect.

0:00
/
Labeling in CVAT with a rectangle

Bounding boxes may work for your project but if you need a tighter outline of your objects for better performance, you can use the polygon tool. Similar to the polygon tool, you also have the option of using the polyline annotation tool. Below is a video of the polygon option.

0:00
/
Labeling in CVAT with the polygon tool

In addition to manual labeling tools, CVAT offers model assisted labeling which allows you to use artificial intelligence to automate labeling work. In the AI Tools section, you'll find Interactors and Detectors which help create polygons.

Interactors include Deep Extreme Cut  (DEXTR), Feature Backpropagating Refinement Scheme (f-BRS), High Resolution Net (HRNet), and Inside-Outside-Guidance.

DEXTR Interactor example

Detectors include Mask RCNN and Faster RCNN which are deep learning models suitable for specific labels. If your dataset includes objects that these models support, you'll be able to very quickly label those objects thanks to the labels being applied automatically.

Example of Detector options in CVAT

CVAT's variety of labeling options give you different ways to annotate your data for a given project's purpose. When using CVAT, it's important to define your classes and labeling method before starting the project to ensure you do not need to spend time updating annotations.

Exporting Annotations From CVAT

You first want to click "Save". CVAT does not automatically save work.

Then click "Menu", in CVAT you will see the following options:

Menu from CVAT on my labeling task

Then you want to click "Export task dataset" and you can choose among different formats: label VOC XML, label COCO JSON, label YOLO annotations, etc. Before exporting, be sure to know the format needed to train your specified model of choice. In the even you need to convert your annotations to a format CVAT does not support, Roboflow offers free dataset conversions to 26+ formats.

Congrats! Now you have a labeled dataset.

CVAT on Local for Serious CVAT

If you are serious about CVAT, you can configure it on local. The CVAT website has these limitations:

  • No more than 10 tasks per user
  • Uploaded data is limited to 500Mb

On local you will not be subject to these limitations because your machine will be doing the heavy lifting.

To launch CVAT on local, first clone the CVAT repository in your terminal window.

git clone https://github.com/opencv/cvat.git
cd cvat

Then, if you don't have Docker, install Docker. See that Docker is sucessfully installed:

docker version

Now we build CVAT on local and launch with

docker-compose build
docker-compose up -d

This will take a while to run. It is building CVAT dependencies in your local machine.

Then you create your username within your local CVAT service by executing into it:

docker exec -it cvat bash -ic 'python3 ~/manage.py createsuperuser'

Now, navigate to your browser and type

http://localhost:8080/

This will navigate to your local CVAT!

You can come back later and restart the service. If you are having trouble logging into CVAT, you can rebuild with no-cache:

docker-compose build --no-cache 
docker-compose up -d

CVAT Labeling Tips, Tricks, Best Practices

When you're operating in CVAT, carefully annotate objects with your downstream model in mind. Keep these labeling best practices in mind while working through your dataset:

1) Label entirely around the object

2) For occluded objects - label them entirely

3) Generally label objects that are partially out of frame

4) Beware of labeling many boxes that overlap or are entirely contained within each other. This can really confuse your model.


CVAT shortcuts:

  • Start your labels list with the most represented class - it will be the default when you draw a box
  • Label all objects in each class first - you can focus on them and change all of their labels at once
  • Type "N" to draw a new box

CVAT Alternatives

CVAT is just one of many computer vision labeling tools. If you're wondering if it's right for you, you may want to read our Ultimate Guide to Object Detection or try Roboflow Annotate, which is designed to simplify many of the rough edges open source tools like CVAT have.

Common CVAT alternatives include:

Looking to Get Started with Annotating Data?

Roboflow provides easy annotation with smart auto-suggested defaults. It's no surprise users annotate faster with Roboflow.

Next Steps after Labeling Your Computer Vision Dataset in CVAT

Once your dataset is labeled in CVAT, it is time to move to the creation of your computer vision model!

Roboflow makes it easy to load in your data (just drag and drop your images and your annotation file from CVAT). You can generate even more data with augmentations such as flipping images for CV, random cropping, and creating synthetic computer vision data. If you are interested in using data augmentations to increase the number of your training images (to spend less time in CVAT), this is a good guide on using data augmentation in computer vision.

When you are ready, use Roboflow Train to train a model with one-click and quickly test your model using our web app or your webcam. Alternatively, you can export your data from Roboflow to any format and start training your computer vision model.

Our posts on How to Train YOLOv4 and How to Train EfficientDet are good starting points to train your model and then from model evaluation, you can gauge how much more data you may need to collect and annotate.

Output of model inference finding bounding boxes for docks and lifts in aerial photo taken from a drone.
Inference after training only 74 aerial drone images