Real Time Streaming Protocol (RTSP) enables you to access video from an internet-connected camera. With the Roboflow Inference Python package, you can use computer vision models on frames returned by an internet-connected camera over the RTSP protocol.

There are many commercial and industrial uses for running vision models using RTSP data. Consider a scenario where you want to track how many cars are in a drive-through lane at a given time. You could use computer vision to count the number of cars visible from a camera pointed toward the drive-through lane.

In this guide, we are going to show you how to run computer vision models on RTSP data. Our tests will use an object detection model. But, you can use segmentation, classification, and foundation models (i.e. CLIP) with a few modifications to the code we will walk through in this guide. These modifications will be pointed out as we go.

Here is an example of a video feed from an RTSP camera on which a computer vision model is run:

0:00
/0:06

Without further ado, let’s get started!

Run Vision Models on RTSP Stream Data

To run vision models on RTSP stream data, we will use Inference, a high-performance, open-source server through which you can run computer vision models. Inference can run many different types of models, from fine-tuned YOLOv8 models to CLIP and SAM. We will use a fine-tuned YOLOv8 model in this guide.

Let’s walk through all the steps to run a computer vision model on data from an RTSP stream.

Preparation (Optional): Upload a Model to Roboflow

To run computer vision models on RTSP stream data, you will need a model to run. In this guide, we are going to show how to run a fine-tuned object detection model trained on Roboflow. If you already have a model trained on Roboflow, skip to the next step.

If you have trained a YOLOv5 and YOLOv8 detection, classification, or segmentation model, or a YOLOv7 segmentation model, you can upload your model to Roboflow for use in running inference on your RTSP video stream.

To upload a model to Roboflow, first install the Roboflow Python package:

pip install roboflow

Then, create a new Python file and paste in the following code:

from roboflow import Roboflow

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"{HOME}/runs/detect/train/")

In the code above, add your API key and the path to the model weights you want to upload. Learn how to retrieve your API key. Your weights will be uploaded to Roboflow. Your model will shortly be accessible over an API, and available for use in Inference. To learn more about uploading model weights to Roboflow, check out our full guide to uploading weights to Roboflow.

If you do not already have a model, check out the Roboflow Getting Started guide to learn how to train a model for your use case.

Step #1: Install Dependencies

With a model ready, we can start setting up a script to run inference with our model on an RTSP stream.

First, we need to install the required dependencies for the project. We will be using inference, which we will use to run our model.

To install the required dependencies, run the following command:

pip install inference

Now, let’s set up our stream.

Step #2: Set up an inference.Stream() object

The inference.InferencePipeline() object allows you to access a webcam or an RTSP stream. The InferencePipeline() object accepts a callback function which is applied to every frame read from the stream. This callback function is referred to as a "sink". You can create a custom sink that includes any logic you want or you can use one of the built in sinks within Inference.

Let’s run a model trained on the Microsoft COCO benchmark to test the stream is working. Create a new Python file and add the following code:

from inference import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes

pipeline = InferencePipeline.init(
    model_id="yolov8x-1280",
    video_reference=0, #Specify device number as integer
    on_prediction=render_boxes,
    api_key=api_key,
)
pipeline.start()
pipeline.join()

In this code, we import a built in sink, render_boxes which uses Supervision to draw predictions on the input frame.

If we instead wanted to create a custom sink, we could define a function that takes an array of predictions, a VideoFrame object, and an arbitrary list of keyword arguments.

def on_prediction(
    predictions: dict,
    video_frame: VideoFrame,
    **kwargs
)

In this section of code, replace “video_reference” with the IP address of your webcam. You can specify an integer device ID to test on a webcam connected to your computer, too. Then, replace “model_id” with the model name and ID associated with your project. Learn how to retrieve your model name and ID. You will also need to provide your API key. Learn how to find your Roboflow API key.

Step #3: Test the Model on the Stream

We are now ready to test our model on a stream. To do so, run the Python script in which you have written your code. If you have kept the argument on_prediction=render_boxes in the last section, a window will appear in which you can see the video from the provided RTSP stream.

Here is an example of the model running on a stream:

0:00
/0:06

Conclusion

RTSP stream support is a common feature in internet connected cameras. With the Roboflow Inference Python package, you can run a computer vision model on all frames returned by an RTSP model. A few lines of code are required to set up a stream. Then you can start writing logic that is applied to each frame.

In this guide, we walked through how to run vision models on RTSP stream data. We wrote logic that lets us visualize bounding boxes that represent predictions returned by an object detection model. The code can be extended with your own custom logic, or modified to support different model types.