2 Jan 2025 • 8 min read What is Few-Shot Learning? In this blog post, we discuss what few-shot learning is, architectural approaches for implementing few-shot learning, and specific implementations of few-shot learning techniques.
10 Dec 2024 • 13 min read How to Fine-tune PaliGemma 2 Learn how to fine-tune PaliGemma 2 to extract data from an image in JSON format.
16 Oct 2024 • 6 min read Launch: Use Florence-2 in Roboflow Workflows Learn how to use Florence-2 in Roboflow Workflows for zero-shot object detection, OCR, and more.
3 Sep 2024 • 7 min read Table and Figure Understanding with Computer Vision Learn how to use Roboflow Workflows and multimodal models to derive information about the contents of tables and figures.
1 Sep 2024 • 15 min read What is CLIP? Contrastive Language-Image Pre-Processing Explained. CLIP is an open source, multimodal computer vision model developed by OpenAI. Learn what CLIP is in this guide.
22 Jul 2024 • 7 min read How to OCR Hand-Written Notes with GPT-4 Learn how to OCR hand-written notes with GPT-4.
12 Jul 2024 • 3 min read Document Understanding with Multimodal Models Learn how to use the PaliGemma multimodal model to ask questions about the contents of a document.
12 Jul 2024 • 3 min read Visual Question Answering with Multimodal Models Learn how to use the PaliGemma multimodal model to ask questions about images.
12 Jul 2024 • 4 min read Understand Website Screenshots with a Multimodal Vision Model Learn how to use the Florence-2 multimodal model to generate rich descriptions of website screenshots.
12 Jul 2024 • 4 min read How to Caption Images with a Multimodal Vision Model Learn how to caption images using a multimodal vision model.
10 Jul 2024 • 5 min read How to Use Florence-2 for Optical Character Recognition Learn how to use the Florence-2 model for Optical Character Recognition tasks.
10 Jul 2024 • 4 min read What is Dense Image Captioning? Learn what dense image captioning is and how to use the MIT-licensed Florence-2 model to generate dense image captions.
25 Jun 2024 • 12 min read How to Fine-tune Florence-2 for Object Detection Tasks This tutorial will show you how to fine-tune Florence-2 on object detection datasets to improve model performance for your specific use case.
20 Jun 2024 • 5 min read Florence-2: Open Source Vision Foundation Model by Microsoft Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license.
17 May 2024 • 7 min read How to Fine-tune PaliGemma for Object Detection Tasks Learn how to fine-tune the PaliGemma multimodal model to detect custom objects.
17 May 2024 • 8 min read Finetuning Moondream2 for Computer Vision Tasks In this guide, we finetune and improve Moondream2, a small, local, fast multimodal Vision Language Model, for a computer vision task.
15 May 2024 • 10 min read PaliGemma: An Open Multimodal Model by Google PaliGemma is a vision language model (VLM) developed and released by Google that has multimodal capabilities. Learn how to use it.
14 May 2024 • 10 min read GPT-4o: The Comprehensive Guide and Explanation Learn what GPT-4o is, how it differs from previous models, evaluate its performance, and use cases for GPT-4o.
26 Mar 2024 • 9 min read Ultimate Guide to Using CLIP with Intel Gaudi2 Learn how to use CLIP on the Intel Gaudi2 chip. This guide discusses training and deploying a custom CLIP model on Gaudi2.
21 Mar 2024 • 4 min read Launch: YOLO-World Support in Roboflow Learn how you can use YOLO-World with Roboflow.
16 Mar 2024 • 6 min read Best OCR Models for Text Recognition in Images See how nine different OCR models compare for scene text recognition across industrial domains.
13 Mar 2024 • 10 min read What is Visual Question Answering (VQA)? Learn what Visual Question Answering (VQA) is, how it works, and explore models commonly used for VQA.
5 Mar 2024 • 6 min read First Impressions with the Claude 3 Opus Vision API The Roboflow team ran several computer vision tests using the Claude 3 Opus Vision API. Read our results.
3 Mar 2024 • 6 min read Multimodal Video Analysis with CLIP using Intel Gaudi2 HPUs Learn how to use CLIP and the Intel Gaudi2 chip to run multimodal analyses and classification on videos.