Open Source Computer Vision Deployment with Roboflow Inference

Today, we are open sourcing the Roboflow Inference Server: our solution for using and deploying computer vision models in production, used to power millions of production model inferences. We are also announcing Roboflow Inference, an opinionated framework for creating standardized APIs around computer vision models.

Roboflow Deploy powers millions of daily inferences across thousands of models for hundreds of customers (including some of the world’s largest companies), and now we’re making the core technology available to the community under a permissive, Apache 2.0 license.

We hope this release accelerates the graduation of cutting-edge computer vision models from the realm of research and academia into the world of real applications powering real businesses.

pip install inference

Roboflow Inference lets you easily get predictions from computer vision models through a simple, standardized interface. It supports a variety of model architectures for tasks like object detection, instance segmentation, single-label classification, and multi-label classification and works seamlessly with custom models you’ve trained and/or deployed with Roboflow, along with the tens of thousands of fine-tuned models shared by our community.

To install the package on a CPU device, run:

pip install inference

To install the package on a GPU device, run:

pip install inference-gpu

Supported Fine-Tuned Models

Currently, Roboflow Inference has plugins implemented to serve the following architectures:

Object Detection

Ultralytics YOLOv8
Ultralytics YOLOv5

Instance Segmentation

Ultralytics YOLOv8
Ultralytics YOLOv5
YOLOv7
YOLACT

Single-Label Classification

Ultralytics YOLOv8
ViT

Multi-Label Classification

The next models to be supported will be the Autodistill base models. We’ll be adding additional new models based on customer and community demand. If there’s a model you’d like to see added, please open an issue (or submit a PR)!

Implementing New Models

Roboflow Inference is designed with extensibility in mind. Adding your own proprietary model is as simple as implementing a infer function that accepts an image and returns a prediction.

We will be publishing documentation on how to add new architectures to inference soon!

Foundation Models

Support for generic models like CLIP and SAM is already implemented. These models often complement fine-tuned models (for example, see how Autodistill uses foundation models to train supervised models):

We plan to add other generic models soon for tasks like OCR, pose estimation, captioning, and visual question answering.

The Inference Server

The Roboflow Inference Server is an HTTP microservice interface for inference. It supports many different deployment targets via Docker and is optimized to route and serve requests from edge devices or via the cloud in a standardized format. (If you’ve ever used Roboflow’s Hosted API, you’ve already used our Inference Server!)

Additionally, when you want to go beyond the basic functionality, the inference server has plug-ins that seamlessly integrate with Roboflow’s platform for model management, automated active learning, advanced monitoring, and device administration.

Getting predictions from your model is as simple as sending an HTTP POST request:

import requests

BASE_URL = "http://localhost:9001"

res = requests.post(
    f"{BASE_URL}/{model_id}?"
    + "&".join(
        [
            f"api_key={api_key}",
            f"confidence={confidence}",
            f"overlap={overlap}",
            f"image={image_url}",
            f"max_detections={max_detections}",
        ]
    )
)

print(res.json())

Where:

model_id: The ID of your model on Roboflow. You can find your model ID with reference to the Roboflow documentation.
api_key: Your Roboflow API key. Learn how to retrieve your Roboflow API key.
confidence: The minimum confidence level that must be met for a prediction to be returned.
overlap: The minimum IoU threshold that must be met for a prediction to be returned.
image_url: The URL of the image on which you want to run inference. This can also be a base64 string or a NumPy array.
max_detections: The maximum number of detections to return.

For more information on getting started, check out the Inference Quickstart.

Roboflow Managed Inference

While some users choose to self-host the Inference Server for network, privacy, and compliance purposes, Roboflow also offers our Hosted API as a fully turn-key serverless inference solution. It already serves millions of inferences per day, powering rapid prototyping and supporting mission-critical systems operating in manufacturing to healthcare.

At scale, we also manage dedicated Kubernetes clusters of auto-scaling GPU machines so that our customers don’t need to allocate valuable MLOps resources to scaling their computer vision model deployment. We have tuned our deployments to maximize GPU utilization, so our managed solution is often much cheaper than building on your own and if you need to do a VPC deployment inside of your own cloud, that’s available as well. Contact sales for more information about enterprise deployment.

Model Licensing

While Roboflow Inference (and the Roboflow Inference Server) are licensed under a liberal, Apache 2.0, open source license, some of the supported models use different licenses (including copyleft licenses such as GPL and AGPL in some cases). For models you train on your own, you should check to ensure that these models’ licenses support your business use-case.

For any model you train using Roboflow Train (and some other models), Roboflow’s paid plans include a commercial license for deployment via inference and the Inference Server so long as you follow your plan’s usage limits.

Start Using Roboflow Inference Today

Roboflow Inference is at the heart of what we do at Roboflow: providing powerful technologies with which you can build and deploy computer vision models that solve your business needs. We actively use Roboflow Inference internally, and are committed to improving the server to provide more functionality.

Over the next few weeks and months, we will be working on allowing you to bring your own models to Roboflow Inference that are not hosted on Roboflow, device management solutions so you can monitor if your servers are running, and more.

Is there a feature you would like to see in Roboflow Inference that we do not currently support? Leave an Issue on the project GitHub and we will evaluate your request.

Because the project is open source, you can extend the inference server to meet your needs. Want to see support for a model we don't currently support? You can build it into the server and use the same HTTP-based API the server configured for inference.

If you would like to help us add new models to the Inference Server, leave an Issue on the project GitHub repository. We will advise if there is already work going on to add a model. If no work has started, you can add a new model from scratch; if a contributor is already adding a model, we can point you to where you can help. Check out the project contribution guidelines for more information.

Cite this Post

Use the following entry to cite this post in your research:

Brad Dwyer, James Gallagher. (Aug 16, 2023). Open Source Computer Vision Deployment with Roboflow Inference. Roboflow Blog: https://blog.roboflow.com/open-source-inference-server/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Open Source Computer Vision Deployment with Roboflow Inference

pip install inference

Supported Fine-Tuned Models

Implementing New Models

Foundation Models

The Inference Server

Roboflow Managed Inference

Model Licensing

Start Using Roboflow Inference Today

Cite this Post

Discuss this Post

Brad Dwyer

Table of Contents

MORE ABOUT

Product Updates

Launch: Deploy Florence-2 with Roboflow

Launch: Roboflow Project Folders

Launch: Deploy YOLOv10 Models with Roboflow

Launch: Computer Vision Model Monitoring with Roboflow

Launch: Deploy YOLOv9 Models with Roboflow

Launch: Run Vision Models on Multiple Streams