This guide was contributed to Roboflow by Charlie Macnamara.

Encountering issues related to digital media is not uncommon in computer vision tasks. Preprocessing tasks, including frame extraction, format conversion, and quality adjustments, play a crucial role in image and video analysis.

One tool that excels in these tasks is FFmpeg. Trusted by major players like VLC, YouTube, and OBS, FFmpeg is a collection of libraries and programs collaboratively developed since 2000 by Fabrice Bellard. From converting video formats to efficiently resizing images and extracting audio, FFmpeg has become the preferred solution for many multimedia workflows.

In this guide, we are going to discuss the basics of using FFmpeg for computer vision tasks. We will show examples of converting formats, merging video files, and splitting a video into separate frames for use in vision tasks.

Let’s get started!

How to Install FFmpeg

To set up FFmpeg on your system, begin by accessing the terminal. On Linux, the terminal application is readily available. For macOS users, locate the Terminal app in the Applications > Utilities folder. Windows users can opt for the Windows Subsystem for Linux (WSL) or install a Bash terminal like Git Bash.

Once in the terminal, update your package lists to ensure you acquire the latest version of FFmpeg. On Ubuntu or Debian systems, execute the following command:

sudo apt update

After updating, install FFmpeg and its dependencies using the package manager. On Ubuntu or Debian, use the following command:

sudo apt install ffmpeg

On macOS, you can install FFmpeg with Homebrew:

brew install ffmpeg

To verify that installation was successful, run the following command:

ffmpeg -version

This command should display information about the recently installed version of FFmpeg.

FFmpeg for Computer Vision

Convert Video Formats

Videos come in various formats and sizes, and compatibility issues are common. FFmpeg simplifies this process.

Consider converting an AVI video to an MP4 file. AVI is a common file format for storing footage from CCTV cameras, but many computer vision systems work with either MOV or MP4. We can convert our data to MP4 for use in further processing with a vision system (i.e. running an object detection model) using the following code:

We can perform this task using the following code:

ffmpeg -i input.avi -c:v libx264 -c:a aac -strict experimental output.mp4

To convert from MOV to MP4, another common conversion, you can use:

ffmpeg -i -vcodec h264 -acodec mp2 input.mp4

Split a Video Into Frames

You can split a video into image files that represent each frame using FFmpeg. You may want to complete this task if you want to divide a video into frames to use in training a computer vision model. Many common computer vision models (i.e. object detection, classification models) are trained on annotated images rather than videos.

To split a video into frames, run:

ffmpeg -i input.mp4 -vf fps=1 output_%04d.png

You can configure at what interval frames are taken using the fps flag. In the above example, fps is set to 1. This means that one frame every second will be captured then saved to a file called output_{frame_number}.png, where {frame_number} represents the collected frame.

Use FFmpeg to Upload Videos to Roboflow

We can use FFmpeg with the Python subprocess library to divide a video into a folder of images for use in training a computer vision model.

To do so, you will need a Roboflow account and a project in your workspace. To create a project, first go to your Roboflow dashboard. Then, click “Create a Project”.

You can upload videos through our web interface, or you can upload them programmatically. A programmatic upload is ideal if you have a large volume of data that you need to upload.

Create a new Python file and add the following code:

import os
import subprocess
from roboflow import Roboflow

def video_to_frames(video_path, output_folder):
    if the output_folder is not available:
        create it
    command = [
        '-i', video_path,
        os.path.join(output_folder, 'frame_%04d.png')

def upload_frames_to_roboflow(api_key, workspace_id, project_id, frames_folder):
    rf = Roboflow(api_key=api_key)

    project = rf.workspace(workspace_id).project(project_id)

if __name__ == "__main__":
    roboflow_api_key = "my-api-key"
    roboflow_workspace_id = 'my-workspace'
    roboflow_project_id = 'my project-id'
    video_path = 'input.mp4'
    frames_folder = 'outputs/'

    video_to_frames(video_path, frames_folder)
    upload_frames_to_roboflow(roboflow_api_key, roboflow_workspace_id, roboflow_project_id, frames_folder)

We define a function called `video_to_frames` which uses `ffmpeg` to convert a video into PNG frames with systematic naming.`upload_frames_to_roboflow` connects to Roboflow, prints workspace details, accesses the specified project, and uploads the frames.

In the code, replace:

  1. roboflow_api_key with your Roboflow API key. Learn how to retrieve your API key.
  2. roboflow_workspace_id with your Roboflow workspace ID and roboflow_project_id with your Roboflow project ID. Learn how to retrieve your workspace and project IDs.
  3. video_path with the name of the video you want to upload to Roboflow.

The code converts the video to frames using video_to_frames and uploads them to the designated Roboflow project with upload_frames_to_roboflow. This automation accelerates the conversion of videos to frames and their seamless integration into a Roboflow project, streamlining dataset management.

Run the script above to upload images to your workspace. The images will appear on the “Annotate” tab on your Roboflow dashboard.


FFmpeg is a command line utility with which you can perform many manipulations with videos for computer vision use cases.

You can convert videos between formats, concatenate two or more videos, split a video into frames for use in further processing, and more. In this guide, we demonstrated how to complete those three tasks with FFmpeg. We then showed how to upload videos to Roboflow using Python and the FFmpeg command line tool.