How to Train RF-DETR on a Custom Dataset

Released on March 19th, 2025, RF-DETR is a transformer-based object detection model architecture developed by Roboflow.

RF-DETR achieves state-of-the-art performance, beating models like LW-DETR and YOLOv11 on both COCO and the newly-introduced RF100-VL dataset. RF100-VL is a benchmark that aims to validate the generalizability of detection-capable models across a range of domains.

By scaling resolution to 728, RF-DETR achieves 60.5 mAP at 25 FPS on an NVIDIA T4 GPU, becoming the first documented real-time model to break the 60mAP barrier on the Microsoft COCO benchmark. The model also achieves 25 FPS on an NVIDIA T4.

RF-DETR is licensed under an Apache 2.0 license, which permits free commercial use.

In this guide, we are going to walk through how to train an RF-DETR model on a custom dataset. We will train a mahjong tile recognition model as an example, a task that involves identifying several different classes.

Here is an example of the results from the model we will train (right), alongside the ground truth (left):

The results from our model are almost identical to the ground truth, a testament to the quality of predictions from RF-DETR.

Without further ado, let’s get started!

💡
You can follow along with this guide using our Colab training notebook. We recommend training with an A100.

Prepare a Dataset

To get started, we need to prepare a dataset. For this guide, we will use a mahjong tile recognition dataset, one of the datasets in the RF100-VL benchmark. This dataset contains over 2,000 images of mahjong tiles, and is licensed under an Apache 2.0 license.

You can download the dataset from the mahjong tile page on Roboflow Universe. In the training section below, we show how to download the dataset directly into a Colab notebook.

You can follow along with this guide using any dataset you want, although you will need a dataset in the COCO JSON format for use with training an RF-DETR model.

If you need to label a dataset, you can do so using Roboflow Annotate, our fully-featured, web-based annotation tool. Annotate comes with a range of tools to speed up the labeling process, including a suite of AI-assisted labeling tools. 

If you need to convert a dataset to COCO JSON, you can upload it to Roboflow and export the data in the COCO JSON format.

Train on Device

We have prepared a Colab notebook that you can use to follow along with this guide. Our Colab notebook walks through everything from trying RF-DETR with the base COCO weights to training and running inference on a fine-tuned model.

We recommend using an NVIDIA A100 GPU to fine-tune RF-DETR.

Step #1: Install the RF-DETR SDK

To get started, we need to install the RF-DETR SDK. You can install the SDK with the following command:

!pip install -q rfdetr

In addition, run nvidia-smi to make sure that you have a GPU available. The output should show your GPU, like this:

Step #2: Download a Dataset

Next, we need to download a dataset into our training environment. We can download the mahjong dataset directly from Roboflow.

First, set your Roboflow API key in an environment variable called ROBOFLOW_API_KEY. Learn how to retrieve your Roboflow API key.

Then, run the following code:

from roboflow import download_dataset

dataset = download_dataset("https://universe.roboflow.com/rf-100-vl/mahjong-vtacs-mexax-m4vyu-sjtd/dataset/2", "coco")

You can replace the dataset URL above with any of the 500,000+ datasets on Roboflow Universe.

Step #3: Start an RF-DETR Training Job

With a labeled dataset in our training environment, we can now start to fine-tune a model.

To fine-tune a model, we can use this code:

from rfdetr import RFDETRBase

model = RFDETRBase()
history = []

def callback2(data):
	history.append(data)

model.callbacks["on_fit_epoch_end"].append(callback2)

model.train(dataset_dir=dataset.location, epochs=15, batch_size=16, lr=1e-4)

Here, we load the RF-DETR base model, then pass in the location of our downloaded dataset. For this guide, we are going to train for 15 epochs and use a batch size of 16. This batch size is optimised for an A100.

When you run the code, you will see messages showing progress as the model trains, like this:

The amount of time it takes to train your model will vary depending on the size of your dataset and the number of epochs you specify. In our tests, it took around an hour to train a model with 2,000 images for 15 epochs using an A100 GPU.

We recommend training for at least 50 epochs for a production model.

Once your model has trained, the model weights and associated metadata will be saved in a directory called output. 

Step #3: Review Model Evaluation Metrics

You can review your model metrics by plotting the data saved by the RF-DETR training routine.

You can plot loss using the following code:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(history)

plt.figure(figsize=(12, 8))

plt.plot(
	df['epoch'],
	df['train_loss'],
	label='Training Loss',
	marker='o',
	linestyle='-'
)

plt.plot(
	df['epoch'],
	df['test_loss'],
	label='Validation Loss',
	marker='o',
	linestyle='--'
)

plt.title('Train/Validation Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.show()

This will return a chart showing loss per epoch. Loss should decrease over time.

Our training notebook also walks through an example showing how to calculate AP over epochs.

Test Your RF-DETR Model

With a trained model ready, the next step is to run inference on an example image.

To load data from our test set and visualize predictions on example images, we are going to use the supervision Python package. supervision has a range of utilities for use in building computer vision applications, including data loaders and annotators.

Let’s load our test set:

import supervision as sv

ds = sv.DetectionDataset.from_coco(
	images_directory_path=f"{dataset.location}/test",
	annotations_path=f"{dataset.location}/test/_annotations.coco.json",
)

Next, let’s load a random image from our test set and compare the ground truth to the results from our model:

path, image, annotations = ds[4]

from rfdetr import RFDETRBase
from rfdetr.util.coco_classes import COCO_CLASSES
import supervision as sv
import numpy as np
from PIL import Image

image = Image.open(path)

detections = model.predict(image, threshold=0.5)

text_scale = sv.calculate_optimal_text_scale(resolution_wh=image.size)
thickness = sv.calculate_optimal_line_thickness(resolution_wh=image.size)

bbox_annotator = sv.BoxAnnotator(thickness=thickness)
label_annotator = sv.LabelAnnotator(
	text_color=sv.Color.BLACK,
	text_scale=text_scale,
	text_thickness=thickness,
	smart_position=True)

annotations_labels = [
	f"{ds.classes[class_id]}"
	for class_id
	in annotations.class_id
]

detections_labels = [
	f"{ds.classes[class_id]} {confidence:.2f}"
	for class_id, confidence
	in zip(detections.class_id, detections.confidence)
]

annotation_image = image.copy()
annotation_image = bbox_annotator.annotate(annotation_image, annotations)
annotation_image = label_annotator.annotate(annotation_image, annotations, annotations_labels)

detections_image = image.copy()
detections_image = bbox_annotator.annotate(detections_image, detections)
detections_image = label_annotator.annotate(detections_image, detections, detections_labels)

sv.plot_images_grid(images=[annotation_image, detections_image], grid_size=(1, 2), titles=["Annotation", "Detection"])

Above, we load a random image, run inference with our fine-tuned model, then plot the ground truth annotations and model results on the same image side-by-side.

Here are the results: 

RF-DETR successfully identified a wide range of the mahjong tiles. While a few tiles were missed, this could be addressed by increasing the number of epochs over which the model trains.

Now that you have visualized the results from your model, the next step is to think about model deployment. In addition to offering strong accuracy, RF-DETR models also run at 25 FPS on an NVIDIA T4. This makes them ideal for use in edge deployment environments.

We will also be launching support for deploying RF-DETR in Roboflow Inference, our open source computer vision inference server. This will be accompanied by support in Roboflow Workflows, our vision AI application builder. This will be announced in the coming days.

Conclusion

RF-DETR is a new, real-time computer vision model architecture developed by Roboflow. The model is the first object detection model to pass 60 mAP when validated on the Microsoft COCO benchmark. In addition, the model runs at around 25 FPS on an NVIDIA T4.

In this guide, we walked through how to train an RF-DETR model on a custom dataset. We downloaded an open source object detection dataset from Roboflow Universe, trained a model for 15 epochs, plotted training graphs, and visualized the results of the model.

Curious to learn more about RF-DETR? Check out our RF-DETR announcement wherein we talk more about the inspiration behind and the architecture of the model.