This guide will take you the long distance from unlabeled images to a working computer vision model deployed and inferencing live at 15FPS on the affordable and scalable Luxonis OpenCV AI Kit (OAK) device.

The best part of this tutorial is you can get started today before your OAK arrives, training your custom model, so you are ready to deploy upon receipt!

We've trained an initial model to have the OAK recognize chess pieces in realtime, at 15 FPS 

Buckle up and let's get started.

Don't forget to subscribe to our Youtube channel for more free computer vision content!

About the Luxonis OpenCV AI Kit

The OpenCV AI Kit (OAK), crafted by the company Luxonis, has gathered significant momentum over the course of its kickstarter and the hype is certainly shared here at Roboflow. Yesterday, we unwrapped the OAK-1 and today, in one single day, we have a custom object detector running and inferencing live in 4K on a device that easily sits in the palm of your hand. Here is a photo of the whole setup.

Our OAK demonstration setup - detecting chess pieces in realtime

The OAK is known for using depth as an additional input to computer vision models. The OAK comes with a 4K camera and a Myriad X Vision Processing Unit (VPU) where all of the neural network operations will occur. The OAK is small and processes quickly, making it an idea choice for deploying computer vision models at scale.

Roadmap of the Tutorial

Let's take a quick overview of the roadmap of this tutorial, so you know where we are heading.

Steps taken in this tutorial

In the length of a single blog, we will be deploying a custom object detection model to the OAK by:

These steps leverage these free (except for the OAK) tools:

It is important to remember that each of these pieces is replaceable, but if you decide to switch a piece out, you need to be sure that the new piece fits with the prior and the next.

Building a Dataset

In order to teach your model to detect your custom objects, you need to first gather training data to supervise the model.

If you already have a labeled object detection dataset, you can skip this section.

If you just want to get a feel for the technology presented here and want to follow along with the guide, we will be doing chess. The chess dataset is hosted publicly on Roboflow.

Gathering Images

Our model has the capacity to detect any object in the world, so do not limit your imagination. However, in order to get a performant model, you should limit the complexity of the domain that your model needs to learn. A lot of that narrowing can happen on during the selection and structuring of your vision task.

We recommend gathering at least 50 images of your detection scene to get started. You can always scale it up to improve model performance later.

Labeling Images

Once you have gathered a training dataset of images, you need to label those images with bounding boxes to teach your model to locate and tag objects in the scene.

Here are great guides we've written for labeling object detection datasets in a variety of free, open-source tools:

Leverage Roboflow for a Preprocessing and Augmentations

Once you have your images labeled, you can drag and drop your dataset into Roboflow.

First, create a free account. Then, you can drag the files in your dataset export into Roboflow. If your dataset came from another place, never fear, Roboflow supports over 30 annotation formats.

Uploading data to Roboflow

Once your data is in Roboflow you will be able to choose preprocessing and augmentation options. Preprocessing will align your images to be the size and shape that your model will need in training and in inference. Augmentation will create additional images from your base dataset to allow you to increase your model's performance and spend less time gathering and annotating images.

Choosing Preprocessing and Augmentation options in Roboflow

Once you've chosen your preprocessing and augmentation settings hit Generate and then on the right you will see an option to Export Your Dataset and Get Link. We will export to YOLO Darknet format.

Choosing the YOLO Darknet export format in Roboflow.

You will receive a curl link. Keep it handy for the next step as we begin to train our custom object detector.

Training a Custom Object Detector in Darknet

Darknet is a deep learning library where some of the best object detectors in the game have been designed and implemented. One of these is YOLOv3-tiny, which we will be using today. Why not use YOLOv4 or YOLOv5? The short answer is they are not yet easily compatible with OpenVino, which we will need to run inference on the OAK device.

Go ahead and jump to this Colab Notebook to Train YOLOv3-tiny and Deploy to OAK. Save a copy in drive so you can edit your own version.

This notebook and guide are heavily based on my guide on how to train YOLOv4-tiny in Darknet, so head over there if you need more details along the way.

Cloning and Installing Darknet

Ok so the best part about this notebook is that I have hard coded the helper repository hashes for you so you don't have to worry about all the new progress that is being made in Darknet breaking your code. Let's go!

We clone Darknet, edit the Makefile to point towards the GPU and cuDNN. And then we make Darknet. You will know that your install is successful if you see a line starting with this printout at the end.

g++ -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv` -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wfatal-errors -Wno-unused-r
Printout proving that darknet has been made

Importing our Dataset From Roboflow

In the steps above you acquired a curl link from Roboflow. It is time to copy and paste that link in the notebook. If you jumped past that section, you can get the import curl link from the public chess dataset.

When the curl downloads you will see your dataset coming into the notebook

Downloading our dataset version from Roboflow in the Colab Notebook.

Next, there is some python code to route that dataset to the correct location. You can run those cells without too much inspection.

Writing a Custom Training Configuration File

Next, we write a custom training configuration file for our custom dataset. We take the base YOLOv3-tiny configuration file and make some edits from there.

max_batches will determine how long your training job runs for. This is recommended to be set at 2000 * num_classes but you can feel free to edit from there. If you are getting a GPU out of memory error at train time, increase the number of subdivisions at the top of the file.

Additionally, the network configuration needs to dynamically adjust based on the num_classes. We have written that to automatically configure in the notebook for you so you should not have to worry on that step.

writing config for a custom YOLOv4 detector detecting number of classes: 12

Go ahead and take a look at the configuration file with %cat cfg/custom-yolov3-tiny-detector.cfg and make sure everything looks alright.

Start Darknet Custom Training Job

Next, we will start training the network with our custom dataset.

Training a custom YOLOv3-tiny detector in Darknet

The detector will make a pass over the dataset, breaking it into batches. It will automatically tweak its parameters to best fit your data. Watch for the mean average precision (mAP) metric to rise. If you are curious, here is a guide on what does mAP mean in machine learning.

Infer With Custom Weights

The training job will save the best weights for your detector based on the validation set's mAP metric. We can take those weights and use them for inference on a test image. This is an important step along the way. You want to make sure your detector is in good shape before you go through the work to deploy it onto your OAK device.

Running inference on a test image that the model has never seen before

We can see here that our model has done a good job of learning to tag chess pieces on a chess board. It is ready to move forward!

Convert Custom Darknet Model to OpenVino

We are continuing in the Train YOLOv3-tiny and Deploy to OAK Colab notebook.

Darknet is a useful framework for training and research, but it is not optimized to run on the Myriad X VPU hardware that is present on the Luxonis OAK-1. So, we need to translate our Darknet model to Intel's OpenVino runtime model framework. To do so, we will first convert to TensorFlow .pb and then on to OpenVino.

Again, the commit hashes are saved so this should run smoothly for you as you apply it to your own custom model.

These steps rely on OpenVino's documentation on converting YOLO models to OpenVino.

Convert model to .pb

To convert our model to .pb we leverage the repo from mystic123. To convert, we need to point the conversion script to our weights file and the list of class names, obj.names.

Install OpenVino

Next up we install OpenVino in our notebook. This can take up to 10 minutes ⏰.

Convert .pb to OpenVino .blob

Next, we write yolo_v3_tiny.json to define some special parameters for the OpenVino conversion. Then we execute a script to convert our model to .xml and .bin. After that we post the location of those files to the OpenVino API and get back a .blob version of our model. The .blob is what we will need to take with us to the OAK for deployment. In the notebook with .tar the file and download.

Deploy Custom OpenVino YOLOv3-tiny Model to OAK Device

Now that you have your YOLOv3-tiny custom model in .blob format, it is time to put it on device!

In order to use the OAK-1, you will need to have a host system that can accept USB input. DepthAI (the deployment software environment) says that it hosts Ubuntu, Raspbian, and macOS. Windows is currently in Beta. The safest OS to go with is Ubuntu, which we utilize in this tutorial. You may need to get a new host device to with Ubuntu or Raspbian to support your OAK-1. Luxonis also offers detection options that come on a Raspberry Pi, but the OAK-1 and OAK-D are not these offerings.

Once our host device is set up, we need only plug in the OAK to the USB port to begin.

Plugging in the OAK-1 device!

Now let's set up the software libraries to run our custom model on the OAK device.

Installing DepthAI

DepthAI is Luxonis's software library for running computer vision models. We need to install it to get our model running. The installation documents are available on DepthAI's installation guides.

Open up the terminal. In Ubuntu, run these commands to install python, DepthAI, DepthAI python requirements, and point your system to the OAK device.

sudo apt install git python3-pip python3-opencv libcurl4 libatlas-base-dev libhdf5-dev libhdf5-serial-dev libatlas-base-dev libjasper-dev libqtgui4 libqt4-test
echo 'SUBSYSTEM=="usb", ATTRS{idVendor}=="03e7", MODE="0666"' | sudo tee /etc/udev/rules.d/80-movidius.rules
sudo udevadm control --reload-rules && sudo udevadm trigger
git clone https://github.com/luxonis/depthai.git
cd depthai
python3 -m pip install -r requirements.txt

To ensure your model is running, run:

python3 test.py

This should start up a model with the default COCO (common objects in context) weights. This model can detect 80 common objects. Time to specialize it to our custom objects!

Running YOLOv3-tiny on DepthAI

The available models are present in the DepthAI folder under resources/nn/. The model we are interested in replicating is in resources/nn/tiny-yolo. Right now, there is a simple mask/no-mask detector in there. Kick it off with:

python3 test.py -cnn tiny-yolo

Try on your mask and see if it detects it!

Using the pretrained mask-no-mask detector in DepthAI on the OAK-1 device

Running a custom YOLOv3-tiny on DepthAI

Now we want to customize the YOLOv3-tiny model with our own weights. Make a copy of the resources/nn/tiny-yolo folder to resources/nn/tiny-yolo-copy so we can revert to it if need be. In the resources/nn/tiny-yolo folder you will see two files, a .blob file and a .json file.

Start by overwriting the .blob file with your own custom .blob file downloaded from the notebook.

Next, overwrite the .json file with your class names.

Now, major hack alert! The depthai/depthai_helpers/tiny_yolo_v3_handler.py file has hard coded three classes in there. Change this file to have your select number of classes. Furthermore, we need to edit the neural net output size to accept additional classes, if your custom dataset has them. We do this by increasing the second output shapes for both outputs in the decode_tiny_yolo method and increasing the second output shapes for both outputs in the tiny-yolo.json file. For the chess dataset, with 12 classes, we needed to increase this number from 24 to 51. You can find this number by raising and lowering the number until you don't get an error. We will be updating this portion of the tutorial as a more flexible solution is encoded.

And there you have it! Once you have customized tiny-yolo.blob, tiny-yolo.json, and tiny_yolo_v3_handler.py run:

python3 test.py -cnn tiny-yolo

and you will see the OAK-1 device detecting live at 15FPS on your custom objects.

Our custom model is up and running live at 15FPS on the OAK device

From here, you may need to adjusttiny_yolo_v3_handler.py slightly to process detections as you see fit. Perhaps you want to post them directly to a cloud bucket or send notifications via an application. The rest is up to you 🚀

Conclusion

Congratulations! In this tutorial, you have learned how to go from unlabeled images to a trained custom computer vision model running live at 15FPS on the Luxonis OAK-1 device. This is an extremely powerful feat considering the cost and scalability of this hardware.

As always, happy detecting.