Computer vision models deployed on an edge device such as an NVIDIA Jetson do not need a regular network connection to run inference. You can run a model locally, on your device. If necessary, you can send inference results across your network when a connection is available.

In this guide, we are going to discuss how to deploy computer vision models offline using Roboflow Inference, an open source scalable inference server through which you can run fine-tuned and foundation vision models.

We will show how to:

  1. Configure a model for use with Inference
  2. Set up Inference
  3. Run a vision model on an image and webcam

Here is an example of predictions from a shipping container detection model that runs offline:

Without further ado, let’s get started!

What is Roboflow Inference?

Roboflow Inference is an inference server on which you can run fine-tuned and foundation computer vision models. With Inference, you can deploy object detection, classification, and segmentation models on an edge device, allowing local – and offline – access. You can make HTTP requests to retrieve model predictions from Inference, or the Python SDK.

Inference has been built with edge deployment in mind. You can run vision models on ARM devices like the Raspberry Pi, CUDA-enabled devices such as the NVIDIA Jetson (with TRT support), x86 devices, and more.

Inference is production-ready. The Inference codebase powers millions of API calls made to Roboflow’s hosted inference API, as well as edge devices in enterprises with complex computer vision deployments.

With Inference, you can run the following models offline:

  • YOLOv5 for object detection, classification, and segmentation
  • YOLOv7 for segmentation
  • YOLOv8 for object detection, classification, and segmentation
  • CLIP
  • Segment Anything (SAM) for segmentation
  • DocTR (for OCR)

When you first use Inference, model weights are downloaded to a Docker container running on your device. This Docker container manages Inference. Then, you can run Inference offline. Note: You will need to connect to the internet every time you update your model, or every 30 days, whichever is shortest.

Preparation: Upload or Train a Model on Roboflow

In this guide, we will show how to deploy a YOLOv8 object detection model. To deploy a YOLOv5, YOLOv7, or YOLOv8 model with Inference, you need to train a model on Roboflow, or upload a supported model to Roboflow.

  • Learn how to deploy a trained model to Roboflow
  • Learn how to train a model on Roboflow

Foundation models such as CLIP, SAM, DocTR work out of the box. You will still need an internet connection to download the weights, after which point you can run them offline.

Once you have a model hosted on Roboflow, you can start deploying your model with Inference.

Step #1: Set Up Roboflow Inference

Roboflow Inference runs in Docker, with Dockerfiles available for a range of popular edge devices and compute architectures. The Inference Docker manages all the dependencies associated with the models you deploy, so you can focus more on building your application logic.

First, install Docker. See the official Docker installation instructions for guidance.

The command you run to download and start the Inference Docker container will depend on the system architecture you are using. For example, if you have a CUDA-enabled GPU, you can use the GPU container. Here is the command you need to run to download and start the Inference GPU container:

docker run --network=host --gpus=all \

This command will pull the Docker container from the Docker Hub. Once the container image has been downloaded, the container will start.

Roboflow Inference will run at http://localhost:9001.

Once you have Inference set up, you can start running a computer vision model on images and webcam streams.

Step #2: Run a Vision Model on an Image

To run a vision model on an image, we can use the Inference SDK. 

First, install the Inference Python package, the Inference SDK, and supervision, a tool with utilities for managing vision predictions:

pip install inference inference-sdk supervision

Next, create a new Python file and add the following code:

import cv2
import supervision as sv
from inference_sdk import InferenceConfiguration, InferenceHTTPClient

image = "containers.jpeg"
MODEL_ID = "logistics-sz9jr/2"

config = InferenceConfiguration(confidence_threshold=0.5, iou_threshold=0.5)

client = InferenceHTTPClient(

class_ids = {}

predictions = client.infer(image)


Above, replace:

  1. The image URL with the name of the image on which you want to run inference.
  2. ROBOFLOW_API_KEY with your Roboflow API key. Learn how to retrieve your Roboflow API key.
  3. MODEL_ID with your Roboflow model ID. Learn how to retrieve your model ID.

Let's run a logistics model that can identify shipping containers on this image:

When you run the script for the first time, the weights for the model you are using will be downloaded for use on your machine. These weights are cached for future use. Then, the image will be sent to your Docker container. Your selected model will run on the image. A JSON response will be returned with predictions from your model.

Here is an example of predictions from an object detection model:

{'time': 0.07499749999988126, 'image': {'width': 1024, 'height': 768}, 'predictions': [{'x': 485.6, 'y': 411.2, 'width': 683.2, 'height': 550.4, 'confidence': 0.578909158706665, 'class': 'freight container', 'class_id': 5}]}

You can then plot these predictions using supervision. Add the following code to the end of your script:

class_ids = {}

for p in predictions["predictions"]:
    class_id = p["class_id"]
    if class_id not in class_ids:
        class_ids[class_id] = p["class"]

detections = sv.Detections.from_inference(predictions)

image = cv2.imread("containers.jpeg")

box_annotator = sv.BoxAnnotator()
labels = [
    f"{class_ids[class_id]} {confidence:0.2f}"
    for _, _, confidence, class_id, _ in detections

annotated_frame = box_annotator.annotate(
    scene=image.copy(), detections=detections, labels=labels

sv.plot_image(image=annotated_frame, size=(16, 16))

This code will allow you to plot model predictions. Here is an example of a logistics object detection model running on an image, with predictions plotted using supervision:

Step #3: Run a Vision Model on a Webcam

You can also run your vision model on a webcam or RTSP stream, in close to real time.

Create a new Python file and add the following code:

from inference import InferencePipeline
from import render_boxes

pipeline = InferencePipeline.init(
    model_id="rock-paper-scissors-sxsw/11", # from Universe
    video_reference=0, # integer device id of webcam or "rstp://" for RTSP stream

Above, replace rock-paper-scissors-sxsw/11 with your Roboflow model ID. Run the following command to set your API key:


Learn how to retrieve your Roboflow API key.

When you run this code, your model will run on frames from your webcam:



In this guide, we walked through how to configure a model for use with Inference, how to set up Inference, and how to run a vision model on an image or video.

With Roboflow Inference, you can deploy computer vision models offline. Inference is an open source inference server through which you can run your vision models, as well as foundation models such as CLIP and SAM. Inference has been optimized for different devices such as CUDA-enabled GPUs, TRT-accelerated devices, and more.