Real Time Streaming Protocol (RTSP) enables you to access video from an internet-connected camera. With the Roboflow Inference Python package, you can use computer vision models on frames returned by an internet-connected camera over the RTSP protocol.

There are many commercial and industrial uses for running vision models using RTSP data. Consider a scenario where you want to track how many cars are in a drive-through lane at a given time. You could use computer vision to count the number of cars visible from a camera pointed toward the drive-through lane.

In this guide, we are going to show you how to run computer vision models on RTSP data. Our tests will use an object detection model. But, you can use segmentation, classification, and foundation models (i.e. CLIP) with a few modifications to the code we will walk through in this guide. These modifications will be pointed out as we go.

Here is an example of a video feed from an RTSP camera on which a computer vision model is run:


Without further ado, let’s get started!

Run Vision Models on RTSP Stream Data

To run vision models on RTSP stream data, we will use Inference, a high-performance, open-source server through which you can run computer vision models. Inference can run many different types of models, from fine-tuned YOLOv8 models to CLIP and SAM. We will use a fine-tuned YOLOv8 model in this guide.

Let’s walk through all the steps to run a computer vision model on data from an RTSP stream.

Preparation (Optional): Upload a Model to Roboflow

To run computer vision models on RTSP stream data, you will need a model to run. In this guide, we are going to show how to run a fine-tuned object detection model trained on Roboflow. If you already have a model trained on Roboflow, skip to the next step.

If you have trained a YOLOv5 and YOLOv8 detection, classification, or segmentation model, or a YOLOv7 segmentation model, you can upload your model to Roboflow for use in running inference on your RTSP video stream.

To upload a model to Roboflow, first install the Roboflow Python package:

pip install roboflow

Then, create a new Python file and paste in the following code:

from roboflow import Roboflow

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("PROJECT_ID")
project.version(DATASET_VERSION).deploy(model_type="yolov8", model_path=f"{HOME}/runs/detect/train/")

In the code above, add your API key and the path to the model weights you want to upload. Learn how to retrieve your API key. Your weights will be uploaded to Roboflow. Your model will shortly be accessible over an API, and available for use in Inference. To learn more about uploading model weights to Roboflow, check out our full guide to uploading weights to Roboflow.

If you do not already have a model, check out the Roboflow Getting Started guide to learn how to train a model for your use case.

Step #1: Install Dependencies

With a model ready, we can start setting up a script to run inference with our model on an RTSP stream.

First, we need to install the required dependencies for the project. We will be using inference, which we will use to run our model. We will also use supervision to annotate predictions from our model. Finally, we will use OpenCV to show the camera stream on our computer.

To install the required dependencies, run the following command:

pip install inference supervision opencv-python

Now, let’s set up our stream.

Step #2: Set up an inference.Stream() object

The inference.Stream() object allows you to access a webcam or an RTSP stream. The Stream() object accepts a callback function which is applied to every frame read from the stream. You can include any logic you want in the callback.

Let’s run a model trained on the Microsoft COCO benchmark to test the stream is working. Create a new Python file and add the following code:

import cv2
import inference
import supervision as sv

annotator = sv.BoxAnnotator()

def render(predictions, image):
    classes = {item["class_id"]: item["class"] for item in predictions["predictions"]}

    detections = sv.Detections.from_roboflow(predictions)


    image = annotator.annotate(
        scene=image, detections=detections, labels=[classes[i] for i in detections.class_id]

    cv2.imshow("Prediction", image)


In this code, we import the required dependencies then create an annotator object that we will use to annotate predictions. If you are running a segmentation model, you can use the sv.MaskAnnotator() instead of sv.BoxAnnotator(). No further changes are required to annotate segmentation predictions. If you are running a classification model, you may want to use OpenCV (cv2) to write predictions on the stream.

Next, we define a function called render() which will be applied to every frame. render() takes in two arguments: predictions and image. predictions refers to the predictions returned by our model. image is the frame on which inference was run, represented as an OpenCV frame. In this example, we annotate the provided frame and show the results.

Finally, we create a stream using inference.Stream(). In this section of code, replace “source” with the IP address of your webcam. You can specify “webcam” to test on a webcam connected to your computer, too. Then, replace “model” with the model name and ID associated with your project. Learn how to retrieve your model name and ID. You will also need to provide your API key. Learn how to find your Roboflow API key.

We have also written a guide that shows how to use inference.Stream() with CLIP, a vision model developed by OpenAI. You can run CLIP on every frame on an RTSP stream with Inference.

Step #3: Test the Model on the Stream

We are now ready to test our model on a stream. To do so, run the Python script in which you have written your code. If you have kept the “cv2.imshow” code in the last section, a window will appear in which you can see the video from the provided RTSP stream.

Here is an example of the model running on a stream:



RTSP stream support is a common feature in internet connected cameras. With the Roboflow Inference Python package, you can run a computer vision model on all frames returned by an RTSP model. A few lines of code are required to set up a stream. Then you can start writing logic that is applied to each frame.

In this guide, we walked through how to run vision models on RTSP stream data. We wrote logic that lets us visualize bounding boxes that represent predictions returned by an object detection model. The code can be extended with your own custom logic, or modified to support different model types.