VoTT for Image Annotation and Labeling

Annotate Images Online Using VoTT

In order to train computer vision models, we need to provide our models with supervision in the form of labeled data. As we show more and more labeled data to our model, the model begins to learn the underlying patterns in our labeling decisions.

After the training process is complete, we can deploy our object detection model for automatic inference.

0:00

/0:09

Labeling a thermal object detection dataset using Microsoft VoTT

Large scale labeling solutions exist, but are costly. Roboflow is free for students, hobbyists, and personal projects. If you are starting a new computer vision project, you might prefer to take a "do it yourself" (DIY) labeling solution to assemble the first version of your dataset.

As computer vision models get better and better, it may take as few as 10-50 images to get the first version of your model off the ground.

VoTT image labeling and annotation guide

Enter Microsoft VoTT, a free, open source annotation tool for computer vision - a "Visual Object Tagging Tool" if you will.

The VoTT ecosystem

Gather Images to Label, Tag, and Annotate

Before starting your labeling job, you must first gather a corpus of unlabeled images for your dataset. We recommend narrowing the domain of your dataset as much as possible to ensure successful modeling results. That is, try to control all of the environmental factors you can control to make the coming task easier on your model.

Microsoft VoTT supports importing images from your local drive, and naturally Bing Image Search and Azure Blob Storage. In this tutorial, we will import our dataset from a local drive.

Installing VoTT Software

Once you have an unlabeled corpus of images, you are ready to install the VoTT labeling software.

Standing Up VoTT Locally

If you have your data in Azure Blob Storage or you are using Bing Image Search, you can go ahead and use VoTT directly through their website.

If you have your images saved to your local drive, it will be easier to start VoTT on your local machine.

Download VoTT Installer

The easiest way to install VoTT locally is by using the installation packages from each release. Installation packages are listed for VoTT on Mac OSX, VoTT on linux, and VoTT on Windows.

Navigate to the Assets box, and download the file you need for your operating system.

VoTT installation packages.

I am building VoTT on Mac OSX. So I will drag VoTT over to my Applications folder.

Installing VoTT on Mac OSX

All set ✅

VoTT launch page 🚀

Optional: Compile VoTT from Source

If you want to make tweaks to VoTT, you may want to compile and run the VoTT tool from source.

To compile VoTT from source, you first need to install NodeJS and NPM. Download and run the install file. You will know you are successful when you can run node -v and npm -v.

Then to start the VoTT tool, run the following lines of code in the directory of your choosing (VoTT will be downloaded to your local machine)

 git clone https://github.com/Microsoft/VoTT.git
 cd VoTT
 npm ci
 npm start

You will see a lot of strange printouts as npm is setting up the project.

Printouts as VoTT builds locally

Again, if you're just getting started, I recommend using the VoTT install packages rather than building from source.

Starting an Image Annotation Project in VoTT

Once you have started VoTT in your choice location (locally installed, built from source, or cloud server), you can go ahead and start your labeling project.

Click New Project. And fill in the relevant fields:

Creating a labeling project in VoTT

For Source Connection map to the folder on your drive that contains the raw image dataset.

Mapping a local dataset import to VoTT

Once you have kicked off your project, you will see your images in the tool, ready for labeling.

Thermal images loaded into VoTT for labeling and annotation

How to Use VoTT Labeling Shortcuts

In order to label a dataset quickly, you will want to leverage the shortcuts available in VoTT.

You can draw a box just by clicking and dragging. You can also start a box by typing capital R on the keyboard.

Your class labels will be hot-keyed, so you can just hit the number hotkey to automatically match the box to the correct class.

You can move through images by using the arrow keys.

Ctrl or Cmd + S saves your progress.

VoTT Labeling Best Practices

When you are labeling images in VoTT, keep these best practices in mind. Ultimately, you are thinking downstream for your modeling task. Any errant annotations or ambiguities should be resolved through your labeling process.

In general, the following practices should be followed:

1) Label around the entire object

2) Keep bounding boxes tight to the object

3) Label occluded objects by drawing a box around the whole object

4) Label objects that are partially out of frame

5) Beware of choosing class labels that often overlap

Labeling best practices video guide

Exporting Data from VoTT

Once your dataset is fully labeled, you can hit Ctrl or Cmd + S to save your progress.

Navigate to the export button one the left side of the tool.

Saving export settings in VoTT

Choose the annotation format you would like to export in Provider. We recommend outputting Pascal VOC and then loading into Roboflow for dataset conversion to any annotation format. Each model uses a specific object detection annotation format so you will need to convert your VOC XML files to another format.

Then Save Export Settings.

Finally, to export your dataset, click the export button in the top pane. The dataset will export to the location you provided when you set up the dataset.

How Does VoTT Compare to Roboflow?

When deciding whether to launch your annotation project in VoTT, it is worth considering VoTT's strengths and weaknesses relative to Roboflow:

Use Your Labeled Dataset from VoTT

Once you have labeled your dataset in VoTT, we recommend uploading your dataset in Pascal VOC format to Roboflow.

From there, you can check the health of your computer vision dataset, manage class labels, export to any dataset format, and use state of the art computer vision models.

With Roboflow, you can generate artificial training data so you can spend less time collecting and labeling, and more time training and deploying your computer vision model.

Data augmentation strategies in Roboflow include flipping images, random cropping, creating synthetic computer vision data, and much, much more.

Depending on the model you choose to train, you may need to convert your VoTT dataset into other formats. We make it easy (and free) to convert VoTT to 15 other formats from JSON or CSV. Popular conversion are to CreateML, OpenAI CLIP, YOLOv4, and COCO.

Happy building.