Tags provide an additional means through which users can search through content. But how can you generate tags of images or videos? That is where computer vision comes in. With the Roboflow image tagging API, powered by the CLIP machine learning model, you can provide a list of suggested tags and find the tag most representative of the content in an image.

In this guide, we are going to showcase how to tag images using the Roboflow image tagging API. We will then talk about ways you can detect specific objects using object detection, with links to resources to help you get started.

By the end of this guide, you will be able to retrieve the most relevant tag for an image given a set of tags.

For example, given the suggestions “construction site”, “busy street”, and “something else”, this image was tagged as “construction site”:

Without further ado, let’s get started!

What is Image Tagging?

Image tagging refers to selecting one or more tags that represent an attribute associated with an image. There are two ways of tagging images: manually or automatically. Manual tagging is useful if you have a low number of images that should be given a particular tag (i.e. a “featured” section on a website). For larger numbers of images, automated tagging can be less time intensive.

One common application of image tagging is to enable richer semantic search experiences. For example, consider a project where you are surveying the amount of construction on a given journey. You could use image tags to make it easy to search for images by category (i.e. does an image contain a construction site, a street, or something else).

Using the Roboflow Image Tagging API

Roboflow maintains an image tagging API that you can use to assign tags to an image. This API is powered by CLIP, an open source computer vision model with commercial and private use permissions developed by OpenAI. With CLIP, you can provide a set of tags and CLIP will tell you which tag is most likely to represent the content in an image. You do not need any experience with computer vision to use the API.

There is no list of allowed tags. You can experiment with your own tags.

To use the API, first create a free Roboflow account. Then, download our starter image tagging script. This script has all the code you will need to tag an image.

To download the starter script and install the required dependency, run the following code:

git clone https://github.com/roboflow/templates
cd examples/image-tagging
pip install requests

You can run the script using the following code:

python tag.py –image=image.jpg –tags="construction site, busy street, something else" –roboflow_api_key="key"

This script accepts two arguments:

  1. image: The image with which you want to work.
  2. tags: The suggested tags that should be evaluated.
  3. roboflow_api_key: Your Roboflow API key.

The script will return a tag for each image. Let’s try it out on an image:

The script that calls the image tagging API returned the following repsonse:

Most similar tag: construction site

If you want to learn more about how to query the API, check out the Python code in the “app.py” script.

Behind the scenes, this script makes a HTTP POST request with a base64 encoded version of an image and suggested tags. Roboflow returns probabilities that show how likely it is that each tag represents an image. The probabilities are returned in order of the classes sent in the request. The script then maps each class to its probability and finds the highest value.

Read the source code in the "app.py" script to learn more about how the script works.

Find a Specific Tagging API

The Roboflow image tagging API is able to assign tags that relate to a wide range of concepts. With that said, our image tagging API cannot understand every concept in a tag. That is where APIs that serve fine-tuned models come in.

Below, we will show how to find a pre-built tagging API and share resources on how you can build your own custom system to detect objects in an image or tag images.

Find a Detection System on Roboflow Universe

Consider a scenario where we want to be able to tag an image depending on whether a shelf is full or contains empty slots. A general image tagging API will not perform as well as an API dedicated to this specific purpose. This is because we would ideally want to be able to count the number of products and empty spaces.

We can explore Roboflow Universe to find an API to help us tag our images. Roboflow Universe features over 50,000 object detection and classification APIs you can use to tag an image. Object detection APIs let you find where a specific object is in an image. Classification APIs return a tag or multiple tags for an entire image.

We can search for “retail cooler model” on Roboflow Universe to see if there is a computer vision model that can assist us.

Click through the models to see if there is one that meets your needs. If you have not found any relevant results on the default “Subject” tab, click “Metadata” to explore more results.

The first result on the “Metadata” tab, Retail Coolers, looks promising. You can see the classes a model can identify in the “Classes” list on the page associated with a model.

To test out the model, click “Model” on the left side of the page. A box will appear in which you can upload images to test. You can also select images from the test set in the dataset on which the model was trained.

Here is an example of the API running:

In this example, there are three positions on the shelf in which there are no products stocked. The empty positions are denoted by the purple bounding boxes in the image above. Empty positions are assigned the class “empty”. All of the predictions returned by the model are listed in the JSON output on the left side.

We could write some logic that states if there is one or more instances of the “empty” class, an image should be given a special tag.

In a supermarket, such a system could be combined with shelf numbers to inform staff when a shelf contains empty products, or when a shelf contains more than a certain number of products.

Create a Custom System to Tag Images

You can also create your own custom system to tag images. There are two main options: train a classification model or train an object detection model.

Classification models return one or more tags that represent the contents of an entire image. Object detection models, on the other hand, return the exact positions of objects of interest in an image. The Roboflow image tagging API uses classification whereas the retail cooler example in the last section is object detection.

To learn more about training your own model on Roboflow, check out the Roboflow Getting Started guide. With help from our getting started guide, you can train the first version of a model in an afternoon without any prior computer vision experience.

Next Steps

The Roboflow image tagging API, powered by CLIP, enables you to provide an image and a list of labels. The API will return values that show how relevant an image is to each label. You can then assign the tag with the maximum corresponding score to an image.

A general tagging API cannot identify all objects, however. If you need to identify an object that the API struggles to identify, or if you need to know exactly where an object is in an image, we recommend exploring Roboflow Universe to find a fine-tuned model. You can also train your own model which can find exactly the objects you want to identify.

When you are ready to start building logic that uses the tagging API, check out supervision, an open source Python package with a range of utilities you can use to build applications that use computer vision.

With supervision, you can:

  1. Filter predictions by class, box area, confidence, and more.
  2. Plot object detection and segmentation predictions on an image.
  3. Use ByteTrack for object tracking.
  4. Use SAHI for small object detection.
  5. And more.

To see the full range of capabilities available in supervision, read the supervision documentation.