Object detection models do a great job of detecting objects in a frame, but they do not track objects across frames in a video or camera stream. If you want to keep track of objects (perhaps to count the number of distinct objects in a video) you need to do object tracking.

Example custom object tracking on fish in an aquarium

This post is a comprehensive guide on how to implement object tracking with your object detection model to track your custom objects.

Here is a link to the Object Tracking Colab Notebook. We recommend having this open next to this blog post.

How Object Tracking Works in this Tutorial

Previously for object tracking, you would need a separate featurizer model to extract object features for similarity comparison. Luckily, with this tutorial, we have made it so you only need to bring your object detection model and features will be extracted with a general model - for more details, check out this post on zero-shot object tracking.

Training Your Object Detection Model

To track your custom objects, you need to first train an object detection model. Currently our object tracking repository supports two options - training a custom YOLOv5 object detection model or using Roboflow's one-click training solution.

Once you have your model trained with either of these options, you are ready to move onto the Object Tracking Colab Notebook. Note: Save a copy in your Drive!

Implementing Object Tracking with Your Object Detection Model

To start, we will clone the zero-shot object tracking repository.

We take a look in the repository and see the following videos available for testing.

cars.mp4  fish.mp4

You can import your own video into Colab for testing by clicking the folder icon and then the upload icon. In this tutorial, we will run the generic COCO model on cars.

We can view this video in Colab with this command (you'll have to manually accept the Google Auth).

#You need to follow the authentication link to view the video
!pip install -U kora
from kora.drive import upload_public
url = upload_public('data/video/cars.mp4')
# then display it
from IPython.display import HTML
HTML(f"""<video src={url} width=500 controls/>""")
An example video of cars driving down a street - before object tracking predictions have been made.

Then we clone CLIP for our general zero-shot object featurizer.

!git clone https://github.com/openai/CLIP.git CLIP-repo
!cp -r ./CLIP-repo/clip ./clip

Install some dependencies:

!pip install --upgrade pip
!pip install -r requirements.txt
!pip install ftfy

And with that we are ready to process our object detection model tracks. Point the clip_object_tracker.py to your video of choice and decide which detection engine you want to use.

YOLOV5, for your own model specify --weights:

!python clip_object_tracker.py --source ./data/video/cars.mp4 --detection-engine yolov5

Roboflow Inference API, for your own model specify your model url:

!python clip_object_tracker.py --source data/video/cards.mp4 --url https://detect.roboflow.com/playing-cards-ow27d/1 --api_key ROBOFLOW_API_KEY

The script will sequentially process frames with detection and object tracking predictions.

[Detections]
1 persons, 8 cars, 2 trucks, 
[Tracks] 11
Done. (0.024s)
video 1/1 (2/266) /content/zero-shot-object-tracking/data/video/cars.mp4: yolov5 inference

[Detections]
9 cars, 1 trucks, 
[Tracks] 10
Done. (0.010s)
video 1/1 (3/266) /content/zero-shot-object-tracking/data/video/cars.mp4: yolov5 inference

After the script is done running, it will save into the runs/detect/exp[num] folder. you can download these videos to evaluate visually like so:

#Download and view the video on your host device
from google.colab import files 
files.download("./runs/detect/exp/cars.mp4")

Here is a view of our car object tracking video after processing.

Example object tracks on our test cars video.

Conclusion

Congratulations! You have successfully learned how to implement object tracking with your custom object detection model - a task that was previously much more difficult without zero-shot features.

Happy training, happy inferencing, and most importantly, happy tracking!