Launch: Active Learning with Roboflow
When training computer vision models, it’s best to train on data representative of your use case.
For example, consider a scenario where you want to label freight containers for use in a yard management solution. Gathering images of freight containers from your own yards will allow you to train a more effective model. This is because your input data will better reflect the environment in which your model will be deployed.
Being able to gather images in real time also prevents model drift, where your model becomes less accurate over time due to changes in the objects you need to detect or the environment in which your model is deployed.
How do you gather the images you need to train – or improve – a model, the so-called “cold start problem”? That is a great question.
We recommend using active learning, a technique where you sample images from input sources like videos according to a configuration you set. In compliance with your custom configuration, frames are uploaded from your system to your Roboflow dataset. You can then label this data to improve your model.
In this guide, we are going to show you how to use active learning with Roboflow.
Without further ado, let’s get started!
What is Active Learning?
Active learning is a technique for avoiding model drift and improving the performance of computer vision models. With active learning, you consistently gather new images for use in training new versions of your model. Active earning is ideal when you have the first version of your model ready and want to improve its accuracy quickly after going to production.
The following steps comprise an active learning workflow:
- Train an initial model, with the option to use an off-the-shelf model as a starting point;
- Evaluate your model to identify opportunities for improvement;
- Collect images to use for training future model versions;
- Label objects of interest from collected images.
- Train a new version of your model using your updated dataset, and;
- Repeat steps for progressive improvement.
Building your own active learning pipeline can be difficult to implement. You have to write your own logic to collect images – which can get complicated, fast – and upload those images to a central data warehouse.
With Roboflow, using active learning is easier than ever. You can configure active learning in a web interface then roll out your configuration to all devices on which your model is deployed. Images gathered with active learning are automatically uploaded to your dataset for use in training a new version of your model.
How to Use Active Learning with Roboflow
Active learning with Roboflow is integrated directly into Roboflow Inference, an edge deployment solution with which you can run models hosted on Roboflow as well as foundation models such as CLIP and Grounding DINO.
You can set your active learning configuration in the Roboflow web application. This will then be retrieved by your Inference server for use in evaluating how to collect images.
Step #1: Start a Project
To use active learning in your project, you can:
- Use a pre-trained model. Roboflow Universe has over 50,000 models you can deploy on your own hardware with Roboflow Inference.
- Train your own model using images you already have.
- Collect data by running Roboflow Inference without a model, known as a model stub.
If you do not yet have a model and cannot find a pre-trained model for your use case, you can use a model stub. A model stub allows you to run Roboflow Inference without an existing model. You can enable active learning to collect data automatically according to your configuration (which we will talk about in the next step).
If you want to use a model stub – ideal if you don't have a model yet – set your-project-name/0
as your Inference model ID when you create an Inference pipeline. Read the Inference documentation on model stubs for more information.
Step #2: Enable Active Learning
Once you have a trained or stub model ready, you can enable active learning. Click “Active Learning” in your Roboflow project. On this page, all supported strategies for data collection are listed. At the time of writing this post, two strategies are supported:
- Random sampling, which randomly chooses whether to collect an image to train a model, and;
- Close-to-threshold sampling, which collects images where predictions are in a defined confidence range or from specific classes.
For example, consider a scenario where you want to improve a shipping container detection model. You could collect all images that match the class “shipping container” with a minimum confidence of 0.3 and a maximum confidence of 0.5. You can set this up in the Active Learning tab associated with your project.
With this configuration, you can gather images that may contain objects of interest and thus could be used to train a new, imptobrf version of your model.
When you're working with an untrained model and relying on a stub, it's recommended to opt for random sampling with a modest “traffic percentage” (default is set at 0.5%). Once the model is trained, we suggest evaluating its predictions with Roboflow’s model evaluation tools. Then, identify the confidence levels associated with false negatives and apply close-to-threshold sampling to gather more data in areas where your model struggles.
Once you have set your active learning configuration, start or restart Roboflow Inference. Your Inference server will automatically collect data for your project and upload the images to Roboflow.
Step #3: Label Data
Shortly after enabling active learning, you will be able to see collected data in the Roboflow dashboard. If you use active learning with a model (instead of a stub model), predictions from the model will be available as initial annotations that you can refine.
Reviewing Annotations with a Human in the Loop
You can then refine your annotations manually in Roboflow using Roboflow Annotate, or using our Outsourced Labeling solution for human-in-the-loop (HITL) review.
Whether you need 24/7 support from Roboflow’s team to label data as soon as it is collected or you are looking to label on a less timely yet regular cadence (daily, weekly, etc.), working with professional labelers ensures you are curating the highest quality datasets to continue to train your models on.
There are three steps to start using HITL labeling: speak with our team, label a sample dataset, then roll out HITL to your active learning pipeline.
Step 1: Speak with our team
The first step in getting started with HITL labeling is to book 15 minutes with our team to discuss your project in detail. In that initial conversation, you will cover the scope of your labeling needs and the different solutions Roboflow has available.
Step 2: Label a sample dataset
After your initial call, you will be put in touch with our labeling team to annotate a sample dataset. This step is to ensure that the data is labeled correctly and all clarifying questions are addressed before labeling data in production.
Step 3: Deploy HITL
Once you are satisfied with the results from the sample dataset, the labelers will then begin working on the data that is being directly collected through your Active Learning pipeline. HITL labeling can be easily scaled up or down as your data labeling needs change over time.
If you have any question about HITL active learning with review from outsourced labelers, email labeling@roboflow.com.
Conclusion
Active learning is a technique to improve performance of your computer vision model. Active learning involves gathering images that meet a specified condition. You can then label these images – or use the labels from the initial prediction – for use in training a new version of your computer vision model.
Active learning is an iterative process. You can keep active learning running for as long as you need to improve your model. If you don’t already have a model version with which to start, you can use a stub model. Then you can gather data with active learning and train the first version of your model.
Roboflow’s active learning solution, integrated into both our web application and our edge Inference product, provides an extensive suite of tools you can use to leverage active learning. You can configure your active learning requirements in our web app, then they will be enforced on all the devices on which you have deployed with Inference.