Multimodal Maestro: Advanced LMM Prompting

Learn how to expand the range of LMMs' capabilities using Multimodal Maestro

GPT-4 Vision Alternatives

Explore alternatives to GPT-4 Vision with Large Multimodal Models such as Qwen-VL and CogVLM, and fine-tuned detection models.

How to Use Roboflow with GPT-4 Vision

Explore ways you can use Roboflow with GPT-4 Vision to solve computer vision problems.

Distilling GPT-4 for Classification with an API

In this guide, learn how to distill GPT-4V to train an image classification model.

DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model

In this guide, we introduce DINO-GPT4V, a model that uses Grounding DINO to detect general objects and GPT-4V to refine labels.

How CLIP and GPT-4V Compare for Classification

In this post, we analyze how CLIP and GPT-4V compare for classification.

Experiments with GPT-4V for Object Detection

See our experiments that explore GPT-4V's object detection capabilities.

GPT-4 Vision Prompt Injection

In this article, we explore what prompt injection is and the techniques people have been using to perform prompt injection attacks on GPT-4.

GPT-4 with Vision: Complete Guide and Evaluation

In this guide, we share findings experimenting with GPT-4 with Vision, released by OpenAI in September 2023.

ChatGPT Code Interpreter for Computer Vision

In this article, we share the results of our experimentation with ChatGPT's code interpreter feature on various computer vision tasks.

How Good Is Bing (GPT-4) Multimodality?

In this blog post, we qualitatively analyze how well Bing’s combination of text and image input ability performs at object detection tasks.

Speculating on How GPT-4 Changes Computer Vision

OpenAI released GPT-4 showcasing strong multi-modal general AI capabilities in addition to impressive logical reasoning capability. Are general models going to obviate the need to label images and train models?