15 Aug 2024 • 8 min read Measure Fish Size using Computer Vision Learn how to use computer vision to measure the size of fish.
14 Aug 2024 • 6 min read How to Create a Workout Pose Correction Tool SUMMARY This tutorial builds a real-time workout form checker using a custom Roboflow keypoint detection model trained to locate the left and right sides of a weight, combined with MediaPipe body tracking and deployed through Roboflow Workflows. The pipeline processes each video frame, extracts keypoints from both sources, and
14 Aug 2024 • 8 min read How to Build a Reading Assistant with AI SUMMARY This tutorial builds an interactive reading assistant that detects a finger pointing at a word on a page using an object detection model, reads the word using OCR, and then calls GPT-4 to pronounce it aloud, all wired together inside a Roboflow Workflow and run as a live
12 Aug 2024 • 6 min read How to Deploy Hugging Face Models with Roboflow SUMMARY Computer vision model weights hosted on Hugging Face, including YOLOv8 variants for detection, segmentation, classification, and keypoint tasks, can be downloaded with Git-LFS and uploaded directly to Roboflow for edge or private-cloud deployment via Roboflow Inference. This guide walks through each step, from cloning model weights to
8 Aug 2024 • 5 min read Automatic Stop Sign Violation Detection SUMMARY This community-contributed project determines whether a vehicle correctly stops at a stop sign by chaining RF-DETR vehicle detection, a fine-tuned front-wheel detection model, and Roboflow PolygonZone geometry checks across a stop zone and an out zone. Vehicles that fail to stop before the white line
8 Aug 2024 • 8 min read What is Optical Character Verification? A Comprehensive Guide SUMMARY Optical Character Verification (OCV) reads printed text on product packaging using OCR and then compares the result against known reference data to catch errors in expiration dates, batch numbers, barcodes, and lot codes before products leave the line. In consumer packaged goods manufacturing, OCV automates a check that would
8 Aug 2024 • 8 min read Camera Calibration in Sports with Keypoints SUMMARY Accurate player tracking in sports footage requires mapping pixel coordinates from a moving camera to real-world field positions, and this tutorial covers how to do that by training a YOLOv8 keypoint detection model to recognize characteristic landmarks on a soccer pitch, then using those detections to compute a
7 Aug 2024 • 8 min read How to Extract Text From Images SUMMARY Optical Character Recognition (OCR) converts static image content in JPG, PNG, or PDF formats into machine-readable text, enabling automation of data entry, accessibility improvements, translation workflows, and document analysis. Roboflow's OCR API is one practical tool for this, offering a programmatic way to extract text at
6 Aug 2024 • 7 min read Ball Tracking in Sports with Computer Vision Ball tracking is crucial for AI systems to analyze sports effectively, but it's challenging due to factors like the ball's small size, high velocity, complex backgrounds, similar-looking objects, and varying lighting. This tutorial will teach you how to overcome these challenges.
2 Aug 2024 • 3 min read How to Import Hugging Face Datasets to Roboflow Learn how to import a Hugging Face dataset into Roboflow for labeling, training, and deployment.
1 Aug 2024 • 7 min read How to Use SAM 2 for Video Segmentation SUMMARY Segment Anything Model 2 (SAM 2) is a unified image and video segmentation model from Meta that achieves 3 times fewer required interactions than prior approaches and runs 6 times faster than the original SAM. This walkthrough covers how to load and configure SAM 2 (available in four sizes
30 Jul 2024 • 8 min read What is Segment Anything 2 (SAM 2)? Learn about Meta AI's new Segment Anything 2 model and how you can use it for image and grounded image segmentation.
19 Jul 2024 • 5 min read Evaluating 2024 Euro Cup and COPA America Cup Jersey Color Accessibility Read how we built a system to evaluate the color contrast of football jerseys used in the 2024 Euro Cup.
19 Jul 2024 • 12 min read Tomato Leaf Disease Detection and Diagnosis using Computer Vision Learn how to build a tomato leaf disease detection and diagnosis system with computer vision.
19 Jul 2024 • 8 min read People Counting Using Computer Vision SUMMARY Computer vision-based people counting tracks entry and exit flows in real time, giving organizations accurate occupancy data that manual headcounts cannot match at scale. This tutorial uses a people detection model from Roboflow to build a working counter that draws bounding boxes, overlays a running tally on each
17 Jul 2024 • 9 min read Top 7 Open Source Object Tracking Tools [2026] SUMMARY Object tracking in computer vision assigns persistent identities to detected objects across video frames, enabling applications like assembly line monitoring, warehouse inventory tracking, and pedestrian flow analysis. This post surveys seven open-source tracking algorithms, ByteTrack, Norfair, MMTracking, DeepSORT, FairMOT, BoT-SORT, and StrongSORT, covering how each handles association,
16 Jul 2024 • 8 min read What is the Open Images Dataset? A Deep Dive. SUMMARY Open Images V7, released by Google in 2022, contains over nine million annotated images spanning nearly 20,000 categories and supports six annotation types: image-level labels, bounding boxes, segmentation masks, visual relationships, localized narratives, and point-level labels. That breadth makes it one of the few public datasets
11 Jul 2024 • 11 min read How to Train RT-DETR on a Custom Dataset with Transformers SUMMARY RT-DETR (Real-Time DEtection TRansformer), developed by Peking University and Baidu, is a transformer-based object detection model that targets competitive speed and accuracy compared to YOLO-family models. This tutorial covers fine-tuning RT-DETR on a custom dataset sourced from Roboflow Universe, using the HuggingFace Transformers
10 Jul 2024 • 16 min read What is Thresholding in Image Processing? A Guide. Learn what image thresholding is and the thresholding strategies you can use in computer vision applications.
10 Jul 2024 • 7 min read The Guide to AI OCR Learn what AI OCR is and how it is used in computer vision.
10 Jul 2024 • 5 min read How to Use Florence-2 for Optical Character Recognition Learn how to use the Florence-2 model for Optical Character Recognition tasks.
10 Jul 2024 • 5 min read What Is Dense Image Captioning? Learn what dense image captioning is and how to use the MIT-licensed Florence-2 model to generate dense image captions.
10 Jul 2024 • 5 min read What Is FPS? A Computer Vision Guide Learn what FPS is and what FPS considerations you should keep in mind when working on computer vision projects.
9 Jul 2024 • 7 min read What is 4M? Apple's Massively Multimodal Masked Modeling SUMMARY Apple's 4M (Massively Multimodal Masked Modeling) framework trains a single unified Transformer encoder-decoder across text, images, 3D geometry, and semantic data by mapping all modalities to discrete tokens and applying a masked modeling objective on a random subset of them. Unlike specialized vision models constrained to