Using computer vision, you can analyze videos to uncover insights relevant to your project.

For example, you could use a football player detection model on a folder of images stored in the cloud to calculate statistics about a game or use a foundation model, like CLIP, to classify frames in a video and to identify when specific scenes (i.e. outdoor scenes) happen in a video.

Using the Roboflow Video Inference API, you can analyze videos using fine-tuned computer vision models (i.e. a football player detection model, a crack detection model) and state-of-the-art foundation vision models such as CLIP.

In this guide, we are going to show how to analyze a folder of videos hosted on Google Cloud Platform with the Roboflow Video Inference API.

By the end of this guide, we will have the following video that shows the results from running an object detection model on a football game:


Without further ado, let’s get started!

Step #1: Select a Computer Vision Model

You can use the Roboflow Video Inference API with:

  • Public fine-tuned vision models hosted on Roboflow Universe;
  • Models you have privately trained on Roboflow, and;
  • Foundation models such as CLIP and a gaze detection model.

Roboflow Universe hosts over 50,000 pre-trained models that cover a range of use cases, from logistics to sports analysis to defect detection. If you have a use case in mind, there may already be a Universe model you can use.

If you are building a logistics application, for example, we recommend the Roboflow Universe Logistics model. This model can identify 20 different objects relevant to logistics such as wooden pallets and people or you can use it as pre-trained weights which can perform better than COCO on logistics use cases.

You can use a model you have trained on Roboflow to analyze a video, too. For example, if you have trained a defect detection model on your own data, you can use that video to analyze your videos. To learn more about how to train your own computer vision model, refer to the Roboflow Getting Started guide.

You can also use foundation models hosted on Roboflow. For example, you can use CLIP to classify frames in a video, ideal for video analysis. Or you can use our gaze detection system to identify the direction in which people are looking, ideal for exam proctoring. Read the Roboflow Video Inference API documentation to learn more about supported foundation models.

Step #2: Submit Videos for Analysis

Once you have selected a model, you are ready to start submitting videos from Google Cloud Platform for analysis. We are going to use a football player detection model for this tutorial, but you can use any model selected according to the last section.

First, we need to install a few dependencies:

pip install google-cloud-storage roboflow supervision

We will use google-cloud-storage to retrieve our videos from GCP Cloud Storage. We will use the roboflow package to submit videos for analysis. We will use supervision to plot predictions from the Roboflow Video Inference API onto a video.

Next, we need to authenticate with Google Cloud. We can do so using the gcloud CLI. To learn how to authenticate with Google Cloud, refer to the official Google Cloud authentication documentation.

You will also need a service account with permission to read the contents of your GCP Storage buckets. Refer to the IAM documentation for more information. When you have created your service account, download a JSON key from the IAM dashboard for your service account.

We are now ready to write a script to submit videos for analysis.

Create a new Python file and add the following code:

from import storage
import datetime
import tqdm
from roboflow import Roboflow

BUCKET_NAME = "my-bucket"

rf = Roboflow(api_key="API_KEY")
project = rf.workspace().project("football-players-detection-3zvbc")
model = project.version(2).model

storage_client = storage.Client.from_service_account_json("./credentials.json")
bucket = storage_client.bucket(BUCKET_NAME)

files = [ for file in bucket.list_blobs()]

file_results = {k: None for k in files}

for file in tqdm.tqdm(files, total=len(files)):
    url = bucket.blob(file).generate_signed_url(

    job_id, signed_url = model.predict_video(

    results = model.poll_until_video_results(job_id)

    file_results[file] = results

    with open("results.json", "w+") as results_file:

In this code, we:

  1. Make a list of all files in the specified GCP bucket.
  2. Send each file for analysis to the Roboflow API using the specified model. In this example, we have used a football player detection model.
  3. Wait for results from each analysis using the poll_until_video_results function.
  4. Save the results from inference to a JSON file.

Replace credentials.json with the IAM JSON credentials file for your service account.

The poll_until_video_results function waits for results to be available. The function is blocking. For large-scale use cases, you could adapt the code above to support concurrency so you can commence multiple analyses at the same time without waiting for the results from each one.

In the code above, replace:

  1. MODEL_ID and VERSION with the model ID and version ID associated with your Roboflow model. Learn how to retrieve your Roboflow model and version IDs.
  2. API_KEY with your Roboflow API key. Learn how to retrieve your Roboflow API key.
  3. fps=5 with the FPS at which you want to run videos. With `fps=5`, inference is run at a rate of five frames per second. We recommend choosing a low FPS for most use cases; choosing a high FPS will result in a higher analysis cost since more inferences need to be run.

Then, run the code.

The code above saves results to a file called results.json. The results from each inference are associated with the name of the file that was processed. The file takes the following structure:

{“file.mp4”: inference_results}

To learn about the structure of inference_results, refer to the Roboflow Video Inference API documentation for the model you are using. For this guide, we can refer to the fine-tuned model documentation since we are working with an fine-tuned object detection model.

Step #3: Visualize Model Results

In the last step, we ran analysis on files in a Google Cloud bucket. Now, let’s visualize the model results. For this section, we will focus on showing how to visualize object detection results since this guide walks through an object detection use case.

Create a new Python file and add the following code:

import supervision as sv
import numpy as np
import json
import roboflow


rf = roboflow.Roboflow()
project = rf.workspace().project("football-players-detection-3zvbc")
model = project.version(2).model

VIDEO_NAME = "video1.mp4"
MODEL_NAME = "football-players-detection-3zvbc"

with open("results.json", "r") as f:
    results = json.load(f)

model_results = results[VIDEO_NAME][MODEL_NAME]

for result in model_results:
    for r in result["predictions"]:
        del r["tracker_id"]

frame_offset = results[VIDEO_NAME]["frame_offset"]

def callback(scene: np.ndarray, index: int) -> np.ndarray:
    if index in frame_offset:
        detections = sv.Detections.from_inference(
        class_names = [i["class"] for i in model_results[frame_offset.index(index)]["predictions"]]
        nearest = min(frame_offset, key=lambda x: abs(x - index))
        detections = sv.Detections.from_inference(
        class_names = [i["class"] for i in model_results[frame_offset.index(nearest)]["predictions"]]

    bounding_box_annotator = sv.BoundingBoxAnnotator()
    label_annotator = sv.LabelAnnotator()

    labels = [class_names[i] for i, _ in enumerate(detections)]

    annotated_image = bounding_box_annotator.annotate(
        scene=scene, detections=detections)
    annotated_image = label_annotator.annotate(
        scene=annotated_image, detections=detections, labels=labels)

    return annotated_image


In this code, we download a video from the Google Cloud Storage API. We then retrieve the inference results for that video that we calculated in the last step. We open the raw video, plot inference results on each frame, then save the results to a new file called “output.mp4”.

In the last step, we ran inference at 5 FPS. This means that we don’t have results to plot for many frames. This would create a flickering effect since we run inference at 5 FPS and our video is stored at 24 FPS.

To counter this, the script above chooses the results closest to the current frame. Thus, if there are no predictions for a frame, our code will plot predictions from the nearest frame.

If you are using a segmentation model, replace sv.BoundingBoxAnnotator() with sv.MaskAnnotator() to visualize segmentation results.

Let’s run our code. Here are the results of predictions plotted on a video:


We have successfully plotted predictions from the Roboflow Video Inference API on our video.

The supervision library we used to plot predictions has a range of utilities for use in working with object detection models. For example, you can filter predictions by confidence, count predictions in a specified zone, and more. To learn more about building computer vision applications with `supervision`, refer to the supervision documentation.


You can use the Roboflow Video Inference API to run computer vision models on videos stored in Google Cloud Storage. In this guide, we demonstrated how to analyze videos stored in Google Cloud Storage using a public computer vision model hosted on Roboflow Universe.

First, we authenticated with Google Cloud Storage. Then, we generated signed URLs for each video in a bucket and sent them to the Roboflow Video Analysis API for use in video analysis. We requested that the Roboflow Video Inference API run a football player detection model on our video. You can analyze videos with models on Roboflow Universe, models you have trained privately on Roboflow, or using foundation models such as CLIP.