Latest Posts

Document Understanding with Multimodal Models

Learn how to use the PaliGemma multimodal model to ask questions about the contents of a document.

Visual Question Answering with Multimodal Models

Learn how to use the PaliGemma multimodal model to ask questions about images.

Understand Website Screenshots with a Multimodal Vision Model

Learn how to use the Florence-2 multimodal model to generate rich descriptions of website screenshots.

How to Caption Images with a Multimodal Vision Model

Learn how to caption images using a multimodal vision model.

How to Train RT-DETR on a Custom Dataset with Transformers

RT-DETR, short for "Real-Time DEtection TRansformer", is a computer vision model developed by Peking University and Baidu. In their paper, "DETRs Beat YOLOs on Real-time Object Detection&

Build Computer Vision Applications with Roboflow and Gradio

Learn how to build computer vision applications with Roboflow and Gradio.

What is Thresholding in Image Processing? A Guide.

Learn what image thresholding is and the thresholding strategies you can use in computer vision applications.

The Guide to AI OCR [2024]

Learn what AI OCR is and how it is used in computer vision.

How to Use Florence-2 for Optical Character Recognition

Learn how to use the Florence-2 model for Optical Character Recognition tasks.

What is Dense Image Captioning?

Learn what dense image captioning is and how to use the MIT-licensed Florence-2 model to generate dense image captions.

What is FPS? A Computer Vision Guide.

Learn what FPS is and what FPS considerations you should keep in mind when working on computer vision projects.

What is 4M? Apple's Massively Multimodal Masked Modeling

4M: Massively Multimodal Masked Modeling, released by Apple in 2024, is a leap forward in the field of multimodal machine learning. This model, building upon the growing capabilities of large

How to use Florence-2 for Instance Segmentation

Florence-2 is a lightweight model licensed under the MIT license. Although it has significantly fewer parameters than competing models like LLaVA 1.5, Florence-2 remains state-of-the-art due to the high-quality

Using GPT-4 To Extract Handwritten Text from Images

This guide walks you through the process of building, training, and deploying a custom computer vision workflow using OpenAI and Roboflow. The process is broken down into three steps: * Building

Computer Vision Solutions for Steel Manufacturing with Roboflow

Learn how computer vision can be used in steel manufacturing facilities.

How to Monitor Productivity with Eye Tracking

Focusing is hard. In recent years, the amount of distractions available to us has been increasing, and we often lose track of how much we are distracted. To help myself

Launch: Roboflow Project Folders

Learn how to use Roboflow's Project Folders to organize projects in your account workspaces.

How to Set Up a Basler Camera on a Jetson

Learn how to set up a Basler camera on an NVIDIA Jetson for use in running computer vision models.

What is F1 Score? A Computer Vision Guide.

Learn what F1 score is, for what it is used, and how to calculate F1 score.

How to Fine-tune Florence-2 for Object Detection Tasks

This tutorial will show you how to fine-tune Florence-2 on object detection datasets to improve model performance for your specific use case.

What is Non-Max Merging?

Learn what Non-Max Merging (NMM) is and how to use NMM with comptuer vision mmodel predictions.

Florence-2: Open Source Vision Foundation Model by Microsoft

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license.

How to Detect Small Objects: A Guide

Learn how to detect small objects using SAHI with the Inference Slicer Python method, and using various pre-processing steps.

Launch: Deploy YOLOv10 Models with Roboflow

Learn how to deploy a YOLOv10 model on Roboflow.

How To Train and Deploy an ANPR System

Learn how to train and deploy a license plate detection model for use in building an ANPR system.