10 Jul 2024 • 4 min read What is Dense Image Captioning? Learn what dense image captioning is and how to use the MIT-licensed Florence-2 model to generate dense image captions.
10 Jul 2024 • 5 min read What is FPS? A Computer Vision Guide. Learn what FPS is and what FPS considerations you should keep in mind when working on computer vision projects.
9 Jul 2024 • 7 min read What is 4M? Apple's Massively Multimodal Masked Modeling 4M: Massively Multimodal Masked Modeling, released by Apple in 2024, is a leap forward in the field of multimodal machine learning. This model, building upon the growing capabilities of large language models, addresses critical challenges in vision models which have traditionally been highly specialized and limited to a single modality
9 Jul 2024 • 5 min read How to use Florence-2 for Instance Segmentation Florence-2 is a lightweight model licensed under the MIT license. Although it has significantly fewer parameters than competing models like LLaVA 1.5, Florence-2 remains state-of-the-art due to the high-quality data it was trained on. Florence-2 is capable of a variety of tasks, including visual question answering, captioning, image detection,
5 Jul 2024 • 5 min read How to Use GPT-4 To Extract Handwritten Text from Images This guide walks you through the process of building, training, and deploying a custom computer vision workflow using OpenAI and Roboflow. The process is broken down into three steps: * Building the model * Connecting the model to a Workflow * Writing code to get the outputs 0:00 /0:07 1× Through
28 Jun 2024 • 19 min read How to Monitor Productivity with Eye Tracking Focusing is hard. In recent years, the amount of distractions available to us has been increasing, and we often lose track of how much we are distracted. To help myself stay engaged, I created a project that accurately tracks how many times I'm distracted in a certain period
27 Jun 2024 • 8 min read What is F1 Score? A Computer Vision Guide. Learn what F1 score is, for what it is used, and how to calculate F1 score.
20 Jun 2024 • 5 min read Florence-2: Vision-language Model Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license.
14 Jun 2024 • 18 min read Edge Detection in Image Processing: An Introduction Learn what edge detection is and how to apply common edge detection algorithms to ab image.
9 Feb 2024 • 7 min read Use Cases for Computer Vision in Healthcare In this guide, we explore use cases for computer vision in healthcare, from pill counting to building automated inventory management systems.
25 Sep 2023 • 6 min read What is DETR (Detection Transformers)? In this guide, we discuss what DETR is, how it works, the strengths and disadvantages of DETR, and how DETR performs.
19 Aug 2021 • 5 min read ImageNet contains naturally occurring NeuralHash collisions NeuralHash is the perceptual hashing model that back's Apple's new CSAM (child sexual abuse material) reporting mechanism. It's an algorithm that takes an image as input and returns a 96-bit unique identifier (a hash) that should match for two images that are "the