Computer Vision

How to Moderate Video Content

Learn how to use the Roboflow Video Inference API to moderate video content.

How to Deploy Computer Vision Models Offline

In this guide, we walk through how to deploy computer vision models (i.e. YOLOv8) offline using Roboflow Inference.

How to Blur People in Images and Videos with an API

In this guide, we show how to use the Roboflow Video Inference API and supervision to blur people in images and videos.

Automatically Label Product SKUs with Autodistill

In this guide, we show how to automatically label product SKUs (with a manual review stage) using Autodistill.

Multimodal Maestro: Advanced LMM Prompting

Learn how to expand the range of LMMs' capabilities using Multimodal Maestro

Manufacturing to Computer Vision: Three Applications From Field Experience

In this article, we explore three applications of computer vision in the manufacturing industry, written by an expert with field experience.

How to Load Image Embeddings into Pinecone

In this guide, learn how to calculate CLIP embeddings with Roboflow Inference and save the results in a Pinecone vector database.

Roboflow Video Inference with Custom Annotators

Performing real-time video inference is crucial for many applications like autonomous vehicles, security systems, logistics, and more. However, setting up a robust video inference pipeline can be time consuming. You

How to Load CLIP Image Embeddings into LanceDB

Learn how to calculate CLIP embeddings using Roboflow Inference and save them into LanceDB.

GPT-4 Vision Alternatives

Explore alternatives to GPT-4 Vision with Large Multimodal Models such as Qwen-VL and CogVLM, and fine-tuned detection models.

How to Search Video Frames with Roboflow

Build a search engine that lets you find frames in a video with text queries using Roboflow Inference.

Launch: Roboflow Video Inference API

In this post, we introduce the Roboflow Video Inference API, a hosted solution for running fine-tuned and foundation models on videos.

What is Optical Character Recognition (OCR)?

Learn what Optical Character Recognition is, what problems can be solved with OCR, and explore the approaches used by OCR algorithms to identify characters.

What is Object Recognition?

In this guide, we discuss what object recognition is, how it works, and how to start using object recognition to solve problems.

What is Retrieval Augmented Generation?

Learn what Retrieval Augmented Generation (RAG) is, how it works, and how RAG can be used in computer vision applications.

What is Zero-Shot Classification?

Learn what zero-shot classification is, what zero-shot classification is used for, and how to use zero-shot classification to solve computer vision problems.

What is an Image Embedding?

Learn what image embeddings are and explore four use cases for embeddings: classifying images and video, clustering images, and image search.

What is Zero-Shot Object Detection?

Learn what zero-shot object detection is, applications for zero-shot object detection, and how to get started with Grounding DINO, a zero-shot model.

Distilling GPT-4 for Classification with an API

In this guide, learn how to distill GPT-4V to train an image classification model.

DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model

In this guide, we introduce DINO-GPT4V, a model that uses Grounding DINO to detect general objects and GPT-4V to refine labels.

How CLIP and GPT-4V Compare for Classification

In this post, we analyze how CLIP and GPT-4V compare for classification.

Experiments with GPT-4V for Object Detection

See our experiments that explore GPT-4V's object detection capabilities.

How to Use MetaCLIP

Learn what MetaCLIP is, how the model performs on benchmarks, and how to use MetaCLIP.

How to Provide Detailed Labeling Instructions to Outsourced Labelers

In this guide, we walk through a few tips and best practices showing how to provide detailed, useful labeling instructions to outsourced labelers.

How to Detect Text in Images with OCR

This guide shows how to use the Roboflow OCR API as part of a two-stage detection system that identifies regions of interest and reads text in them.