Blog

Piotr Skalski

ML Growth Engineer @ Roboflow | Owner @ github.com/SkalskiP/make-sense (2.4k stars) | Blogger @ skalskip.medium.com/ (4.5k followers)

Latest Posts by Piotr Skalski

Detect NBA 3 Second Violations with AI

22 Jul 2025 • 7 min read

Detect NBA 3 Second Violations with AI

Introduction Ever shouted at the screen that a player was camping in the paint? You're not alone. Basketball is a fast paced and dynamic sport, with many rules that apply per player making it difficult for a referee to monitor everything. The 3-second rule in basketball, which prevents

How to Train RF-DETR on a Custom Dataset

20 Mar 2025 • 7 min read

How to Train RF-DETR on a Custom Dataset

Learn how to train an RF-DETR model on a custom dataset.

How to Fine-tune PaliGemma 2

10 Dec 2024 • 13 min read

How to Fine-tune PaliGemma 2

Learn how to fine-tune PaliGemma 2 to extract data from an image in JSON format.

How to Fine-Tune GPT-4o for Object Detection

3 Oct 2024 • 12 min read

How to Fine-Tune GPT-4o for Object Detection

Learn how to fine-tune GPT-4o to detect the location of objects in images.

Camera Calibration in Sports with Keypoints

8 Aug 2024 • 7 min read

Camera Calibration in Sports with Keypoints

Camera calibration is important to accurate vision AI systems that analyse sports. It allows the mapping of their movement on a video frame to real movement on the field, and thus the tracking of the distance they cover, the direction, and the speed at which they move. Homography is commonly

Ball Tracking in Sports with Computer Vision

6 Aug 2024 • 7 min read

Ball Tracking in Sports with Computer Vision

Ball tracking is crucial for AI systems to analyze sports effectively, but it's challenging due to factors like the ball's small size, high velocity, complex backgrounds, similar-looking objects, and varying lighting. This tutorial will teach you how to overcome these challenges.

How to Use SAM 2 for Video Segmentation

1 Aug 2024 • 7 min read

How to Use SAM 2 for Video Segmentation

Segment Anything Model 2 (SAM 2) is a unified video and image segmentation model. Video segmentation presents unique challenges compared to image segmentation. Object motion, deformation, occlusion, lighting changes, and other factors can vary dramatically from frame to frame. Videos are often lower quality than images due to camera motion,

How to Train RT-DETR on a Custom Dataset with Transformers

11 Jul 2024 • 11 min read

How to Train RT-DETR on a Custom Dataset with Transformers

💡Looking for RF-DETR, the state-of-the-art real-time object detection model developed by Roboflow ? Check out the RF-DETR training guide. RF-DETR runs in real time, is the first model to achieve 60+ on COCO, and is state-of-the-art on the RF100-VL benchmark. RT-DETR, short for "Real-Time DEtection TRansformer", is a computer

How to Fine-tune Florence-2 for Object Detection Tasks

25 Jun 2024 • 12 min read

How to Fine-tune Florence-2 for Object Detection Tasks

This tutorial will show you how to fine-tune Florence-2 on object detection datasets to improve model performance for your specific use case.

Florence-2: Vision-language Model

20 Jun 2024 • 5 min read

Florence-2: Vision-language Model

Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license.

How to Train a YOLOv10 Model on a Custom Dataset

24 May 2024 • 6 min read

How to Train a YOLOv10 Model on a Custom Dataset

Learn how to train a YOLOv10 model using a custom dataset.

How to Fine-tune PaliGemma for Object Detection Tasks

17 May 2024 • 7 min read

How to Fine-tune PaliGemma for Object Detection Tasks

Learn how to fine-tune the PaliGemma multimodal model to detect custom objects.

How to Train YOLOv9 on a Custom Dataset

23 Feb 2024 • 9 min read

How to Train YOLOv9 on a Custom Dataset

Learn how to train a YOLOv9 model on a custom dataset.

How to Detect Objects with YOLO-World

16 Feb 2024 • 5 min read

How to Detect Objects with YOLO-World

Learn how to detect objects with YOLO-World, a zero-shot, open-vocabulary object detection model.

YOLO-World: Real-Time, Zero-Shot Object Detection

13 Feb 2024 • 6 min read

YOLO-World: Real-Time, Zero-Shot Object Detection

YOLO-World is a zero-shot, real-time object detection model.

First Impressions with Gemini Advanced

8 Feb 2024 • 7 min read

First Impressions with Gemini Advanced

Read our first impressions using the Gemini Ultra multimodal model across a range of computer vision tasks.

How to Use the Segment Anything Model (SAM)

22 Jan 2024 • 6 min read

How to Use the Segment Anything Model (SAM)

Segment Anything (SAM) is a computer vision model developed by Meta AI. In this guide, you will learn how to use SAM on your own data.

How to Estimate Speed with Computer Vision

19 Jan 2024 • 6 min read

How to Estimate Speed with Computer Vision

In this blog post, we delve into the process of estimating vehicle speed using computer vision, covering the steps from object detection to tracking and addressing challenges like perspective distortion with OpenCV.

cogvlm on aws

20 Dec 2023 • 3 min read

How to Deploy CogVLM on AWS

Guide on deploying a CogVLM Inference Server with 4-bit quantization on Amazon Web Services, covering setup of EC2 instances, configuring hardware and software requirements, and starting the inference server with Docker.

Multimodal Maestro: Advanced LMM Prompting

29 Nov 2023 • 3 min read

Multimodal Maestro: Advanced LMM Prompting

Learn how to expand the range of LMMs' capabilities using Multimodal Maestro

GPT-4 Vision Alternatives

23 Nov 2023 • 7 min read

GPT-4 Vision Alternatives

Explore alternatives to GPT-4 Vision with Large Multimodal Models such as Qwen-VL and CogVLM, and fine-tuned detection models.

visual GPT-4 prompt injection

16 Oct 2023 • 4 min read

GPT-4 Vision Prompt Injection

In this article, we explore what prompt injection is and the techniques people have been using to perform prompt injection attacks on GPT-4.

First Impressions with LLaVA-1.5

10 Oct 2023 • 6 min read

First Impressions with LLaVA-1.5

In this guide, we share our first impressions testing LLaVA-1.5.

GPT-4 with Vision: Complete Guide and Evaluation

27 Sep 2023 • 11 min read

GPT-4 with Vision: Complete Guide and Evaluation

In this guide, we share findings experimenting with GPT-4 with Vision, released by OpenAI in September 2023.

Stay Connected

Get the Latest in Computer Vision First