Historically, building a robust search engine for images was difficult. One could search by features such as file name and image metadata, and use any context around an image (i.e. alt text or surrounding text if an image appears in a passage of text) to provide richer searching feature. This was before the advent of neural networks that can identify semantically related images to a given user query.

OpenAI's Contrastive Language-Image Pre-Training (CLIP) model provides the means through which you can implement a semantic search engine with a few dozen lines of code. The CLIP model has been trained on millions of pairs of text and images, encoding semantics from images and text combined. Using CLIP, you can provide a text query and CLIP will return the images most related to the query.

In this guide, we're going to walk through how to build a semantic search engine using Supabase and OpenAI's CLIP model hosted via Roboflow.

Without further ado, let's get started!

Using Image Embeddings

Embeddings are a numeric representation of data such as text and images. Embeddings are calculated using a model such as CLIP, which was trained on pairs of images and text. Through the CLIP training process, the model learned to encode semantics about the contents of images. We can create a search engine with embeddings. To do so, we need to:

  1. Calculate embeddings for all of the images in our dataset;
  2. Calculate a text embedding for a user query (i.e. "hard hat" or "car") and;
  3. Compare the text embedding to the image embeddings to find related embeddings.

The closer two embeddings are, the more similar the content they represent are.

Let's talk through how to use the Roboflow API to embed images and text then store them in Supabase for search.

Setting Up CLIP via the Hosted Roboflow API

We'll use CLIP via Roboflow's hosted API endpoint, which infinitely scales up/down as we need to call the model. In order to use the Roboflow API you must have an API Key which you can find in your account dashboard using these instructions https://blog.roboflow.com/pip-package/. We also use axios for API requests which you can download from npm.

Getting a CLIP Embedding

The Roboflow API provides a /clip endpoint through which we can generate an embedding for both text and images.

To embed text we can use this TypeScript code:

async function embedText(text: string, apiKey: string) {
    const response = await axios({
        method: "POST",
        url: "https://infer.roboflow.com/clip/embed_text",
        params: {
            api_key: apiKey
        },
        data: {
            clip_version_id: "ViT-B-16",
            text: text
        },
        headers: {
            "Content-Type": "application/json"
        }
    });

    return response.data.embeddings[0];
}

Similarly, to embed an image by uploading it using base64 encoding:

async function embedImage(file: string, apiKey: string) {
    const response = await axios({
        method: "POST",
        url: `https://infer.roboflow.com/clip/embed_image`,
        params: {
            api_key: apiKey
        },
        data: {
            clip_version_id: "ViT-B-16",
            image: [
                {
                  type: "base64",
                  value: file
                }
              ]
        },
        headers: {
            "Content-Type": "application/json"
        }
    });

    return response.data.embeddings[0];
}

Both functions returned a vector of length 512 representing the semantics of the content. This embedding can be used to cross compare text and images, such as searching images based on text or by classifying an image based on a database of text tags.

An easy way to build the supporting search infrastructure is using Supabase's pg_vector extension which adds functions for comparing embeddings such as using the cosine similarity operation.

Setting up Supabase

For semantic search we are going to use Supabase's PostgreSQL powered database engine and the pg_vector extension. The following instructions were adapted from their tutorial on searching ChatGPT text embeddings: https://supabase.com/blog/openai-embeddings-postgres-vector.

First you will need to create a Supabase project, enable the pg_vector extension and create a table to store our image embeddings.

You can create a Supabase project at https://supabase.com/. Then open the SQL console and run the following commands to enable pg_vector and create a table.

create extension vector;
create table images (
  id bigserial primary key,
  image text,
  embedding vector(512)
);

Our new table images will store our image url in the image column as a text string and the embedding from the CLIP API in the embedding column as a vector.

In order to search these images later we have to add a PostgreSQL function to compare the vectors using cosine similarity. Run the following SQL command to add the function which we can call later using RPC.

create or replace function match_images (
  query_embedding vector(512),
  match_threshold float,
  match_count int
)
returns table (
  id bigint,
  image text,
  similarity float
)
language sql stable
as $$
  select
    images.id,
    images.image,
    1 - (images.embedding <=> query_embedding) as similarity
  from images
  where 1 - (images.embedding <=> query_embedding) > match_threshold
  order by similarity desc
  limit match_count;
$$;

Since our database could get very large, we will need an index to sort our images as they are uploaded to make the search algorithm more efficient. Run the following SQL to add the index:

create index on images using ivfflat (embedding vector_cosine_ops)
with
  (lists = 100);

Lastly you must install and initialize the Supabase JS API which you can do using these instructions: https://supabase.com/docs/reference/javascript/installing https://supabase.com/docs/reference/javascript/initializing.

Adding Images and Embeddings to Supabase

In order to search our images later, we first need to embed and upload them to Supabase. The following function will embed a base64 image, upload it to Supabase Storage, and insert a row into our images table.

export const uploadImage = async (image: File) => new Promise((resolve, reject) => {
    var reader = new FileReader();
    reader.onloadend = async () => {
        const embedding = await embedImage(reader.result as string, roboflowAPIKey)
        const imageName = `${Date.now()}${image.name.split('.').slice(-1)}`
        const { data, error } = await supabase
          .storage
          .from('images')
          .upload(imageName, image, {
            cacheControl: '3600',
            upsert: false
        })
        await supabase.from("images").insert({image: imageName, embedding})
		resolve({})
    }
    reader.readAsDataURL(image);
})

Simply pass a File object such as one created by an HTML <input /> into the function uploadImage and it will be embedded and uploaded to our database.

Searching Images by Embedding Using Supabase

Now we can search our database of images semantically using text. First we must embed the text using the CLIP API and then call the RPC function we created earlier to use cosine similarity to query our database.

The following function takes the query text, similarity threshold (from 0 to 1), and max number of results, then returns a list of images.

export const searchDatabase = async (text: string, threshold: number, count: number) => {
    const embedding = await embedText(text, roboflowAPIKey)
    const { data, error } = await supabase.rpc('match_images', {
        query_embedding: embedding,
        match_threshold: threshold,
        match_count: count,
    })
    return data
}

This function returns a list of search objects that look like this and are ordered from most similar to least similar:

{id: 8, image: '123.jpg', similarity: 0.263024778339981}

Download Images from Supabase Storage to View

Lastly we must download the images from Supabase Storage in order to view the searched content. The following function will take an image name (provided by the searchDatabase function as image) and return the image data.

export const downloadImage = async (image: string) => {
    const { data, error } = await supabase
  .storage
  .from('images')
  .download(image)
  return data
}

Only the Beginning

Storing and searching image embeddings opens up a wide array of capabilities. Users can search "dog" and have all images of dogs returned automatically per their CLIP embeddings. We can adapt this approach to do content filtering.

All the above is available today for free: the CLIP API via Roboflow, and pgvector extension on Supabase. Happy building!