Preview: Roboflow + GPT-4

This post is part of Dataset Day in Roboflow Launch Week 2023, the day during which we have announced many new additions to our data management and annotation solutions. To see all of the announcements from launch week, check out the Roboflow Launch Week page.

We're entering a new era of multimodal LLMs like GPT-4 that have a broad understanding of the world obtained through their training on the entirety of human knowledge conveyed via the Internet and rapidly advancing reasoning and task-completion capabilities.

This is a brave new world and Roboflow is excited to help you utilize GPT-4 to level up your computer vision capabilities. Roboflow and GPT-4 will be even more powerful when used in conjunction, and in this post we preview some of the new features that will be coming to Roboflow in the coming weeks.

Evaluation

When the multimodal APIs for GPT-4 are released, we plan to support our users and everyone interested in using GPT-4 for vision in testing the model and evaluating it on their task.

Without access to the multimodal APIs for GPT-4, it is difficult to say how many and which tasks will be better solved by the general model, but we spent some time speculating on how GPT-4 will change computer vision, which computer vision tasks GPT-4 may be better at, and which new tasks GPT-4 unlocks.

In many cases, GPT-4 may fall short of the precision required to get an application into production, but understanding the tradeoffs is the first step. Comparing how GPT-4 does out of the box to other zero-shot and traditional fine-tuned models do will be essential.

Distillation

We will be providing software processes to distill knowledge from GPT-4 to create better data for training your own custom models that can run in your own environment and on the edge.

On the language side, projects like Alpaca have proved the efficacy of distilling GPT-4 knowledge with rapid efficiency using techniques like self-instruct where outputs from a GPT model are used to supervise a smaller open source model LLaMa.

The techniques will look different in computer vision, but we will be working alongside the community to figure them out - whether it be zero-shot labeling techniques to supervise CNNs or eventually, multi-modal model distillation.

Few-Shot

GPT-4's general knowledge is complemented by relevant context with in an area of interest. People often achieve this by semantically embedding relevant text documents into a database and at query time, searching across their document store to find relevant information, and feeding that context to GPT.

With multimodal understanding, you will be able to feed images as context to GPT. We will be building supporting infrastructure at Roboflow to help deploy few-shot context based GPT-4 queries, so users can run queries across their custom datasets for context.

Active Learning

While GPT-4 has strong general intelligence, the model behind the API is fixed. In order to evolve a model with a shifting environment or to address new edge cases, active learning is required - a process where images of edge cases are gathered from the production environment for retraining.

We will be supporting the process of capturing images from a GPT-4 application and adding them to a training corpus for a human in the loop to instruct, both to be used in distillation and few-shot learning.

When OpenAI releases fine-tuning APIs to train multimodal endpoints for GPT-4, we will support those endpoints for training as well (just like we already do for dozens of other models).

Dataset Assistant

Another exciting area of the Roboflow+GPT-4 roadmap is leveraging GPT-4 as a Dataset Assistant. We will be building features like automatic dataset ontology, zero-shot labeling, magic preprocessing and augmentation, and automatic dataset health checks and tips for improving model performance.

All of these techniques will be designed to make the Roboflow application more capable and intuitive, leveraging GPT-4's general knowledge.

The first GPT-4 powered feature inside of Roboflow is live today! You can now use GPT-4 to create a README for your Roboflow Universe project. To try it, go to your project's Overview page in Roboflow and click "Fill with GPT-4" – after linking your OpenAI API Key to your Roboflow account you'll get a list of use-cases based on an analysis of your images and project metadata.

The first GPT-4 powered feature is live in Roboflow today.

Knowledge-Base

We have launched an integration in our website's chat widget leveraging kapa.ai – a GPT-4 powered bot that scans our website's content (including blogs like this one) to give users the best answer's it can, given the Roboflow knowledge base.

It's extremely powerful. It is an expert at Roboflow and computer vision, and it even knows how to code.

Asking kapa.ai to chain two Roboflow model's together given the context of our Roboflow documentation

To try it, just start a conversation with our in-app chat widget (on the bottom right of your screen once you've signed into Roboflow).

Get Started Today

Roboflow and GPT-4 will be more powerful when used in concert. We are excited to be building features on top of GPT-4 and to improve the efficacy of general AI models. And we have already started with some of these features today! Try out the auto-description feature, and our GPT-4 powered chatbot and let us know what you think.

If you'd like early access to the rest of our upcoming GPT-4 powered functionality, be sure to join the waitlist.