Blog

Multimodal

Latest Posts Case Studies Product Updates Logistics Manufacturing

27 Nov 2023 • 5 min read

How to Load CLIP Image Embeddings into LanceDB

Learn how to calculate CLIP embeddings using Roboflow Inference and save them into LanceDB.

GPT-4 Vision Alternatives

23 Nov 2023 • 7 min read

GPT-4 Vision Alternatives

Explore alternatives to GPT-4 Vision with Large Multimodal Models such as Qwen-VL and CogVLM, and fine-tuned detection models.

What is Retrieval Augmented Generation?

16 Nov 2023 • 5 min read

What is Retrieval Augmented Generation?

Learn what Retrieval Augmented Generation (RAG) is, how it works, and how RAG can be used in computer vision applications.

How to Use Roboflow with GPT-4 Vision

15 Nov 2023 • 4 min read

How to Use Roboflow with GPT-4 Vision

Explore ways you can use Roboflow with GPT-4 Vision to solve computer vision problems.

Distilling GPT-4 for Classification with an API

7 Nov 2023 • 4 min read

Distilling GPT-4 for Classification with an API

In this guide, learn how to distill GPT-4V to train an image classification model.

DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model

7 Nov 2023 • 4 min read

DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model

In this guide, we introduce DINO-GPT4V, a model that uses Grounding DINO to detect general objects and GPT-4V to refine labels.

How CLIP and GPT-4V Compare for Classification

7 Nov 2023 • 5 min read

How CLIP and GPT-4V Compare for Classification

In this post, we analyze how CLIP and GPT-4V compare for classification.

Experiments with GPT-4V for Object Detection

7 Nov 2023 • 5 min read

Experiments with GPT-4V for Object Detection

See our experiments that explore GPT-4V's object detection capabilities.

visual GPT-4 prompt injection

16 Oct 2023 • 4 min read

GPT-4 Vision Prompt Injection

In this article, we explore what prompt injection is and the techniques people have been using to perform prompt injection attacks on GPT-4.

First Impressions with LLaVA-1.5

10 Oct 2023 • 6 min read

First Impressions with LLaVA-1.5

In this guide, we share our first impressions testing LLaVA-1.5.

GPT-4 with Vision: Complete Guide and Evaluation

27 Sep 2023 • 11 min read

GPT-4 with Vision: Complete Guide and Evaluation

In this guide, we share findings experimenting with GPT-4 with Vision, released by OpenAI in September 2023.

Using Stable Diffusion and SAM to Modify Image Contents Zero Shot

1 Aug 2023 • 4 min read

Using Stable Diffusion and SAM to Modify Image Contents Zero Shot

Introduction Recent breakthroughs in large language models (LLMs) and foundation computer vision models have unlocked new interfaces and methods for editing images or videos. You may have heard of inpainting, outpainting, generative fill, and text to image; this post will show you how to execute those new generative AI functions

How to Build a Semantic Image Search Engine with Supabase and OpenAI CLIP

17 Jul 2023 • 5 min read

How to Build a Semantic Image Search Engine with Supabase and OpenAI CLIP

Historically, building a robust search engine for images was difficult. One could search by features such as file name and image metadata, and use any context around an image (i.e. alt text or surrounding text if an image appears in a passage of text) to provide richer searching feature.

ChatGPT Code Interpreter for Computer Vision

12 Jul 2023 • 7 min read

ChatGPT Code Interpreter for Computer Vision

In this article, we share the results of our experimentation with ChatGPT's code interpreter feature on various computer vision tasks.

bing

7 Jul 2023 • 7 min read

How Good Is Bing (GPT-4) Multimodality?

In this blog post, we qualitatively analyze how well Bing’s combination of text and image input ability performs at object detection tasks.

Multimodal Models and Computer Vision: A Deep Dive

10 May 2023 • 12 min read

Multimodal Models and Computer Vision: A Deep Dive

In this post, we discuss what multimodals are, how they work, and their impact on solving computer vision problems.

Zero-Shot Image Annotation with Grounding DINO and SAM - A Notebook Tutorial

21 Apr 2023 • 5 min read

Zero-Shot Image Annotation with Grounding DINO and SAM - A Notebook Tutorial

In this comprehensive tutorial, discover how to speed up your image annotation process using Grounding DINO and Segment Anything Model. Learn how to convert object detection datasets into instance segmentation datasets, and use these models to automatically annotate your images.

Speculating on How GPT-4 Changes Computer Vision

16 Mar 2023 • 10 min read

Speculating on How GPT-4 Changes Computer Vision

OpenAI released GPT-4 showcasing strong multi-modal general AI capabilities in addition to impressive logical reasoning capability. Are general models going to obviate the need to label images and train models?

CLIP

25 Jul 2021 • 9 min read

Experimenting with CLIP and VQGAN to Create AI Generated Art

Earlier this year, OpenAI announced a powerful art-creation model called DALL-E. Their model hasn't yet been released but it has captured the imagination of a generation of hackers, artists, and AI-enthusiasts who have been experimenting with using the ideas behind it to replicate the results on their own.

How to Try CLIP: OpenAI's Zero-Shot Image Classifier

8 Jan 2021 • 5 min read

How to Try CLIP: OpenAI's Zero-Shot Image Classifier

Earlier this week, OpenAI dropped a bomb on the computer vision world.

Stay Connected

Get the Latest in Computer Vision First