Dataset quality and the ability to easily curate your data are important to building an effective computer vision model. The easier it is for you to search and explore your data, the better you can curate your dataset to improve model performance.
We are excited to announce advanced dataset search filters, operators, and logic, available now in all Roboflow workspaces. These features enable you to better explore and understand your dataset at all stages of model building, from preparing your first dataset version to making incremental improvements as your model is used in production.
The new filtering features accompany the existing semantic search capabilities in Roboflow dataset search. These capabilities allow you to search for an abstract keyword (i.e. “shipping container”) and find related images. Now, you can both query using a semantic search and narrow your filter with our advanced filters.
In this guide, we are going to show you how to use the new dataset search features available in the Roboflow application to curate datasets for building computer vision models. Without further ado, let’s get started!
Introducing Advanced Dataset Search
Consider a scenario where your model struggles to identify one class even though you have labeled many images with that class. With an effective dataset exploration tool, you can examine your existing images to answer questions like “are my images too similar?” and “are there unannotated instances of this class?”
Roboflow’s new search feature makes answering such questions – and many other questions you have about your dataset – easier than ever. In the Roboflow application, you can now search images by:
- Image width and height
- The number of annotations in an image
- The classes present on the image (or exclude images with a labeled class)
- The split an image is in
You can combine search features using AND or OR statements, allowing you to build complex queries to explore your dataset.
Here are some of the many questions you can answer with the new dataset search features:
- How many images contain an object that is not labeled?
- How many images contain a specific combination of classes?
- How many images exist that feature a particular class in your valid test set?
- How many images contain fewer than two annotations?
- How many images contain a specific label and feature at least three annotations?
Let’s walk through how to use the dataset search and then show a few examples. To see a full reference list of search capabilities in Roboflow, refer to the Search a Dataset documentation.
To access the new dataset search, click on the Images tab in the sidebar of a project in your workspace. Then, click the search bar above the images on the page. This search bar is enabled with our new dataset search features.
When you open the search bar, several example “operators” will appear. An operator is an attribute by which you can query.
Let’s run a few queries. For this guide, we will use the Microsoft COCO dataset, which contains over 120,000 images. First, suppose we want to find all images that contain a cat and a dog, two classes in our dataset. We can find them using the following query:
class:cat AND class:dog
Above, there are many examples of images with labeled cats. We could make a more specific query and filter by split (i.e. only show images with annotated cats and dogs in the training test set), filename, and the other attributes mentioned above.
Let’s run another test. Suppose our model performs poorly at identifying cats. We can run a query to look for all images that do not contain a “cat” annotation but do contain a cat. We can do so by leveraging the semantic search capabilities built into the Roboflow search feature.
When you specify a keyword for which to search (i.e. “cat”), Roboflow will order search results according to their relevance to that keyword. We do this using vector embeddings. We calculate an embedding for your text query (i.e. “cat”) and compare it to the image embeddings for your dataset. We then return the results whose embeddings are closest to the query.
The following query will let us find images where we have missed annotating a cat:
This query excludes all images that contain a “cat” class then searches for images relevant to the text query “cat”. Here are the results:
We can click through to an image to explore each image:
In this image, there is a “dining table” label but no label for the cat. An annotation was missed during the labeling process. We could fix the annotation and repeat this process for different classes to clean up the dataset.
Suppose we want to look for images that have the class “cell phone” and at least three annotations on the image in total. We could do so using the following query:
class:"cell phone" min-annotations:3
The search query successfully returned images that feature class “cell phone” and contain at least three annotations. Note: The min-annotations search flag counts all annotations, not annotations in a specific class.
Below are the search filters available at the time of launching this feature. Refer to the Search a Dataset documentation for the latest updates on advanced dataset search in Roboflow.
like-image:<SOURCE_ID>: Sort by semantic similarity measured by CLIP.
tag: Filter by user-provided tags.
filename: Runs a search for file names that match the provided file name. Use * at the beginning and end of a query to run a partial match.
split: Filters by split (train, test, valid).
job:<JOB_ID>: Shows images with the provided job ID.
min-width:X: Shows images with a width less than X.
max-width:X: Shows images with a width greater than X.
min-height:X: Shows images with a height less than X.
max-height:X: Shows images with a height greater than X.
min-annotations:X: Filters images with more than the specified number of annotations.
max-annotations:X: Shows images with fewer than the specified number of annotations.
class:CLASS: Shows images that have at least one annotation with the provided label.
-class:CLASS: Shows images that do not feature a particular class
You can combine these attributes above using
The new Roboflow search feature, available for use now, provides a powerful suite of features for use in searching and exploring a dataset. With the features described above, you can find images that meet a criteria, identify images that are missing labels, and find images that feature a particular piece of metadata (i.e. a filename that contains a string, or an image with a tag).
This feature is available to all users on free and paid plans. If you are interested in storing large, private datasets in Roboflow for use with our advanced dataset search feature, contact the Roboflow sales team to learn more about pricing.