A guide on how to label your own computer vision dataset using Microsoft VoTT.

Labeling a thermal object detection dataset in VoTT
Labeling a thermal object detection dataset using Microsoft VoTT

In order to train computer vision models, we need to provide our models with supervision in the form of labeled data. As we show more and more labeled data to our model, the model begins to learn the underlying patterns in our labeling decisions. After the training process is complete, we can deploy our object detection model for automatic inference.

YouTube video detailing VoTT. Don't forget to subscribe to the Roboflow YouTube channel.

Large scale labeling solutions exist, but are costly. If you are starting a new computer vision project, you might prefer to take a "do it yourself" (DIY) labeling solution to assemble the first version of your dataset. As computer vision models get better and better, it may take as few as 10-50 images to get the first version of your model off the ground.

Enter Microsoft VoTT, a free, open source annotation tool for computer vision - a "Visual Object Tagging Tool" if you will.

The VoTT ecosystem

Gather Training Images

Before starting your labeling job, you must first gather a corpus of unlabeled images for your dataset. We recommend narrowing the domain of your dataset as much as possible to ensure successful modeling results. That is, try to control all of the environmental factors you can control to make the coming task easier on your model.

Microsoft VoTT supports importing images from your local drive, and naturally Bing Image Search and Azure Blob Storage. In this tutorial, we will import our dataset from a local drive.

Looking to Get Started with Annotating Data?

Roboflow provides easy annotation with smart auto-suggested defaults. It's no surprise users annotate faster with Roboflow.

Installing VoTT Software

Once you have an unlabeled corpus of images, you are ready to install the VoTT labeling software.

Standing Up VoTT Locally

If you have your data in Azure Blob Storage or you are using Bing Image Search, you can go ahead and use VoTT directly through their website.

If you have your images saved to your local drive, it will be easier to start VoTT on your local machine.

Download VoTT Installer

The easiest way to install VoTT locally is by using the installation packages from each release. Installation packages are listed for VoTT on Mac OSX, VoTT on linux, and VoTT on Windows.

Navigate to the Assets box, and download the file you need for your operating system.

VoTT installation packages.

I am building VoTT on Mac OSX. So I will drag VoTT over to my Applications folder.

 Installing VoTT on Mac OSX
Installing VoTT on Mac OSX

All set ✅

VoTT launch page 🚀

Optional: Compile VoTT from Source

If you want to make tweaks to VoTT, you may want to compile and run the VoTT tool from source.

To compile VoTT from source, you first need to install NodeJS and NPM. Download and run the install file. You will know you are successful when you can run node -v and npm -v.

Then to start the VoTT tool, run the following lines of code in the directory of your choosing (VoTT will be downloaded to your local machine)

 git clone https://github.com/Microsoft/VoTT.git
 cd VoTT
 npm ci
 npm start

You will see a lot of strange printouts as npm is setting up the project.

Printouts as VoTT builds locally

Again, if you're just getting started, I recommend using the VoTT install packages rather than building from source.

Starting a Project in VoTT

Once you have started VoTT in your choice location (locally installed, built from source, or cloud server), you can go ahead and start your labeling project.

Click New Project. And fill in the relevant fields:

Creating a labeling project in VoTT

For Source Connection map to the folder on your drive that contains the raw image dataset.

Mapping a local dataset import to VoTT
Mapping a local dataset import to VoTT

Once you have kicked off your project, you will see your images in the tool, ready for labeling.

Our thermal images loaded into VoTT for labeling

VoTT Labeling Shortcuts and Tricks

In order to label your dataset quickly, you will want to leverage the shortcuts available in VoTT.

You can draw a box just by clicking and dragging. You can also start a box by typing capital R on the keyboard.

Your class labels will be hot-keyed, so you can just hit the number hotkey to automatically match the box to the correct class.

You can move through images by using the arrow keys.

Ctrl or Cmd + S saves your progress.

Labeling Best Practices

When you are labeling images in VoTT, keep these best practices in mind. Ultimately, you are thinking downstream for your modeling task. Any errant annotations or ambiguities should be resolved through your labeling process.

In general, the following practices should be followed:

1) Label around the entire object

2) Keep bounding boxes tight to the object

3) Label occluded objects by drawing a box around the whole object

4) Label objects that are partially out of frame

5) Beware of choosing class labels that often overlap

Exporting Data from VoTT

Once your dataset is fully labeled, you can hit Ctrl or Cmd + S to save your progress.

Navigate to the export button one the left side of the tool.

Saving export settings in VoTT

Choose the annotation format you would like to export in Provider. We recommend outputting Pascal VOC and then loading into Roboflow for dataset conversion to any annotation format. Each model uses a specific object detection annotation format so you will need to convert your VOC XML files to another format.

Then Save Export Settings.

Finally, to export your dataset, click the export button in the top pane. The dataset will export to the location you provided when you set up the dataset.

Next Steps after Labeling Your Dataset in VoTT

Once you have labeled your dataset in VoTT, we recommend uploading your dataset in Pascal VOC format to Roboflow. From there, you can check the health of your computer vision dataset, manage class labels, export to any dataset format, and use state of the art computer vision models.

With Roboflow, you can generate artificial training data so you can spend less time collecting and labeling, and more time training and deploying your computer vision model. Data augmentation strategies in Roboflow include flipping images, random cropping, creating synthetic computer vision data, and much, much more.

Happy building!