How to Build an Automated Multimodal Data Labeling Pipeline
Published Aug 29, 2024 • 6 min read

Roboflow Workflows is a low-code computer vision application builder. With Workflows, you can build multi-step computer vision workflows in a browser editor. You can then deploy your Workflows using the Roboflow API, a Dedicated Deployment, or on your own hardware.

One use case for Workflows is to build a system that automatically labels data for use in training new versions of AI models. This paradigm is a subset of Active Learning, in which predictions from a model are saved to a dataset for use in training future model versions.

In this guide, we are going to walk through how to build a Workflow that uses a foundation model, YOLO World, to auto-label data. Foundation models like YOLO World require no training data to use, making them ideal for use in bootstrapping new models without manual labeling.

Here is a demo of our Workflow in use, which runs YOLO World then adds the results from the model to our dataset:

0:00
/0:08

Without further ado, let’s get started!

Step #1: Create a Dataset

To build an active learning pipeline, we will need to create a Roboflow dataset into which we can upload annotated images.

To get started, first create a free Roboflow account. Then, go to the Roboflow dashboard and click “Create Project” to create a project. Choose the object detection dataset type:

If you already have labeled images, you can upload them into your dataset or you can upload images that need to be annotated.

Adding images is optional at this stage, our active learning workflow will auto-add images when we run the workflow later in this guide.

Once you have created a new dataset, you are ready to build an active learning workflow.

Step #1: Create a Workflow

Navigate to the Roboflow dashboard and click “Workflows” in the sidebar. You will be taken to the Workflows home page where you can create a new Workflow.

Click the “Create Workflow” button. A window will appear from which you can choose an example workflow. For this project, select the “Custom Workflow” option:

You will be taken to the Roboflow Workflows interface in which you can build your Workflow.

Step #2: Add a Foundation Model

Our automated labeling workflow will use a foundation model to generate annotations for use in training an object detection model. For this guide, we will use YOLO World, a zero-shot object detection model.

Foundation models like YOLO World are good at identifying generic objects (i.e. cars, screws, metal), but less effective at identifying specific objects (i.e. a product SKU).

For this guide, we will auto-label a dataset that will be used to identify shipping containers.

You could use any model to auto-label data, including foundation models and any of the 50,000+ pre-trained models hosted on Roboflow Universe. For this guide, we will focus on YOLO World. To use a Universe model, swap the YOLO World block for a Roboflow Object Detection block and set the model ID to any model on Universe.

Click “Add Block” in the Workflows interface, then select YOLO World:

In the block configuration panel, set the classes you want to identify in the “Class Names” field:

We can preview the results from the model by adding a Bounding Box Visualization component. This component takes in detections from a model and displays the detections as bounding boxes on the input image:

Our Workflow now has two steps:

  1. Use YOLO World to detect screws, and;
  2. Use Bounding Box Visualization to show results from the model.

If you have set multiple classes with YOLO World, we recommend adding a Label Visualization so you can see the labels with the bounding boxes.

Here is our Workflow:

Let’s try the Workflow on an image.

To preview a Workflow, click “Run Workflow” at the top of the Workflow builder. Then, drag the image you want to run through your Workflow. Then, click “Run Preview” to run your Workflow.

There will be two response types:

  1. A JSON representation of the results, and;
  2. A visual that shows the predictions from our model plotted on the input image.

To see the visual output, click “Show Visual”:

The container predictions from YOLO World are plotted in purple boxes in the image above. YOLO World successfully identified the shipping containers in the image.

Of note, there is an incorrect detection: the wing mirror of the truck in the photo is detected as a container. With that said, this can be cleaned up when we save the annotation to our dataset.

Step #3: Add a Dataset Upload Block

Right now, our Workflow runs inference with YOLO World and returns a result. To build an auto-labeling pipeline, we need to add one more step: upload predictions back to a Roboflow dataset.

To do so, we can use the Roboflow Dataset Upload block:

Once you have added the block, you will need to set the dataset to which images should be added. This is the dataset you created in Step #1.

Set the “Target Project” to the project where you want to upload your images:

Make sure you set the Image value to the input.image:

Then, click the "Additional Properties" button in the Roboflow Dataset Upload settings menu and configure the block to read from your model.predictions predictions:

This configuration will add an input image as well as all of the predictions returned from our model for that image to our dataset.

Step #4: Test the Workflow

We now have our completed Workflow:

This Workflow:

  1. Accepts an image.
  2. Uses YOLO World to auto-label the image.
  3. Visualizes predictions from the model.
  4. Uploads the predictions back to Roboflow.
  5. Returns the predictions and the visualization.

In a new tab, navigate to the Annotate tab of the dataset you created earlier.

Then, in the tab where you have your Workflow open, run the Workflow on an image.

The image will be labeled then uploaded to your workspace.

0:00
/0:08

Next Steps

With an auto-label Workflow, you can automatically label data for use in training new AI models. In this guide, we built a pipeline for object detection labeling but you can use the OpenAI block or any LMM block to add captions or Segment Anything 2 for segmentation. By combining multiple blocks, you can label images for many tasks or multimodal model training.

You can use a foundation model like YOLO World to label images without using a model you have fine-tuned yourself. You can also use models you have trained on or uploaded to Roboflow to label data for use in training new models.

You can extend the Workflow we built in this guide to upload predictions conditionally. For example, you can upload detections that have a certain confidence by using a Detections Filter block and filtering by confidence before visualizing predictions. Or you can upload predictions that are wider or narrower than a given size.

To learn more about Workflows, refer to the Roboflow Workflows documentation. Also see our other Workflows guides, including:

Cite this Post

Use the following entry to cite this post in your research:

James Gallagher. (Aug 29, 2024). How to Build an Automated Multimodal Data Labeling Pipeline. Roboflow Blog: https://blog.roboflow.com/multimodal-data-labeling-pipeline/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Stay Connected
Get the Latest in Computer Vision First
Unsubscribe at any time. Review our Privacy Policy.

Written by

James Gallagher
James is a technical writer at Roboflow, with experience writing documentation on how to train and use state-of-the-art computer vision models.