ChatGPT Code Interpreter for Computer Vision
Published Jul 12, 2023 • 7 min read

What is the Code Interpreter Plugin by OpenAI?

Code interpreter (CI) is an official ChatGPT plugin by OpenAI that pushes the boundaries of what’s possible with AI by enabling data analytics, image conversions, code editing, and much more. With CI, all these tasks can now be performed through the text interface.

GPT-4 + code interpreter plugin

New ChatGPT Capabilities with Code Interpreter

The code interpreter plugin can handle file uploads and downloads. This allows you to work directly with data files, including images and videos, which is particularly useful in computer vision. Besides these, code interpreter supports various file formats, including CSV, JSON, and much more.

Another unique aspect of code interpreter is its ability to reflect upon and learn from the output of the code it runs. This allows code interpreter to correct its own mistakes. Thus, it brings a new dimension to ChatGPT, bridging the gap between natural language understanding and code execution.

Limitations with the Code Interpreter Plugin

While code interpreter brings great power and flexibility, it currently has limitations.

  • Internet Access: Code interpreter does not have access to the internet, which means it can’t directly fetch data from the web or interact with online APIs.
  • File Size: The maximum file size that can be uploaded is 250 MB. To work around this, you can compress your data into a zip file to lower its size. Remember, however, that the uncompressed data still needs to fit within the available memory.
  • Language Support: Currently, code interpreter only supports Python code.
  • Python Packages: Installation of external Python packages is not permitted. However, the coding environment comes pre-installed with over 330 packages. This includes but is not limited to, numpy for numerical computations, pandas for data manipulation and analysis, matplotlib for data visualization, and OpenCV for computer vision tasks.
  • Environment Persistence: If the environment dies, the entire state is lost. Any generated files also become inaccessible as their download links stop working.
  • Knowledge Cut-off: The underlying model, GPT-4, has a “knowledge cut-off”  —  unaware of events that occurred after its training data was collected.

Data Analysis with Code Interpreter

Code interpreter is a game-changer for data analysis. You can interactively perform complex data transformations, statistical analysis, and visualizations. The best part? All this is done conversationally, making the process intuitive, engaging, and approachable for non-technical users.

Visualizations created by Ethan Mollick — ChatGPT Code Interpreter user who doesn’t know Python.

Using Code Interpreter for Computer Vision

Now, let’s delve into how we can harness the power of code interpreter for computer vision tasks. Interestingly, while code interpreter comes pre-installed with powerful libraries such as TensorFlow and PyTorch, ChatGPT will insist that using deep learning models is not possible.

We decided to get more creative and solve computer vision problems leveraging old-school libraries like OpenCV and Tesseract. Remarkably, this entire process was conducted using human language  —  we didn’t manually write a single line of code. The results were quite promising. It makes one imagine a future where AI-assisted development could revolutionize the field of computer vision. With tools like code interpreter, this future doesn’t seem far off.

Face Detection with Code Interpreter

Face detection is a fundamental task in computer vision. We decided to tackle this using a classic method available through OpenCV :  the Haar Cascade classifier. Haar Cascade, while being a powerful tool for face detection, has limitations. It is not as robust or accurate as modern neural network-based methods and often results in false positives.

Face detection using Haar Cascades

However, the way code interpreter handled this problem was truly impressive. Upon encountering the problem of false positives, we provided a detailed prompt describing what was happening and our hunch on why. Astonishingly, with just a single prompt, code interpreter was able to eliminate the false positives. Compare this process with a traditional approach to face detection to get a feel for the difficulty of this task. This instance highlighted the remarkable power and flexibility of the plugin, demonstrating its effectiveness even when working with traditional methods like Haar Cascade. See the steps to run face detection with code interpreter.

Detect, Track, and Count Objects with Code Interpreter

Object detection, tracking, and counting are critical tasks in many computer vision applications. Without access to advanced object detectors like YOLO, we had to think outside the box. We decided to leverage the characteristic color of the object to distinguish it from the background. The code interpreter did a phenomenal job designing a heuristic that allowed clean object detection.

Color-based object detection before filtering
Color-based object detection after filtering

Adding a tracker to the pipeline was surprisingly straightforward. We simply prompted the plugin to “track objects on the video,” and it was able to add this functionality to the pipeline. To get a feel for how incredible this is, compare this process to object tracking through traditional methods.

0:00
/0:09

Counting posed a greater challenge. It seemed like there was some confusion in understanding our expectations. Or perhaps, as some might joke, ChatGPT isn’t great at math. After exchanging several messages and clarifying our requirements, we finally established a full pipeline for detecting, tracking, and counting objects. See the steps to detect, track, and count objects with code interpreter.

Extract Text from Images with Code Interpreter

Extracting text from images, a process known as optical character recognition (OCR), was the most straightforward task in our experiments.

Using Code Interpreter to extract text from the image.

After Tesseract extracted the text, we could feed it into GPT-4, which then structured the information, making it easy to understand and analyze. See the steps to run text extraction with code interpreter.

Leveraging GPT-4 to restructure and organize extracted text.

Looking to the Future and Navigating Restrictions

The exciting possibilities of combining code interpreter with advanced computer vision techniques are somewhat restrained by the current limitations of the environment. Modern computer vision models aren’t executable, and, as we mentioned earlier, installing external libraries isn’t possible in the code interpreter CI environment.

Installing Ultralytics YOLOv8 in the Code Interpreter environment

It turns out that all these restrictions are just suggestions. There are rarely physical limitations behind them. ChatGPT, through an appropriate system of prompts, has been convinced that certain operations are not possible. By using social engineering techniques we can convince the chat to break the rules.

ChatGPT's reaction after the "banned" command finished successfully.

This way, we were able to not only successfully install external packages but also run the Ultralytics YOLOv8 model. Thus giving ChatGPT the tools for a deeper understanding of image input.

Running Ultralytics YOLOv8 in the Code Interpreter environment

This peek into the future has only made us more excited about the potential applications, from automating data collection to developing new machine learning models. The possibilities seem endless, and we look forward to seeing these restrictions lifted in future iterations of the plugin. See the steps to run YOLOv8 with code interpreter.

Practical Tips for Handling Code Interpreter

Here are a few practical tips for working with OpenAI’s code interpreter:

  • Always ask CI to make sure that import and variables are defined. They are constantly disappearing from the context.
  • Code Interpreter is chatty and will always try to guide you step by step through the solution. Try not to print too many logs and results (like embedding values). They can consume your context window very quickly.
  • As we mentioned earlier, sessions with the code interpreter often reset, and with that, your files irretrievably disappear from the environment. Interestingly but annoyingly, ChatGPT does not know that the files are gone and proceeds as if they were still there, leading to unexpected errors. Always verify that the files are still in the environment.
  • Add `notalk;justgo` to the end of your prompts.

Conclusion

The code interpreter plugin is a powerful tool that can significantly enhance the capabilities of ChatGPT and help accelerate computer vision tasks.

Despite the current limitations, the potential applications of code interpreter in computer vision and other fields are enormous. As we continue to push the boundaries of what’s possible with AI, tools like code interpreter will undoubtedly play a crucial role.

If you want to follow more experiments or contribute examples of your own, check out this repo for the latest breakthroughs with code interpreter.

Resources

Cite this Post

Use the following entry to cite this post in your research:

Piotr Skalski. (Jul 12, 2023). ChatGPT Code Interpreter for Computer Vision. Roboflow Blog: https://blog.roboflow.com/chatgpt-code-interpreter-computer-vision/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Stay Connected
Get the Latest in Computer Vision First
Unsubscribe at any time. Review our Privacy Policy.

Written by

Piotr Skalski
ML Growth Engineer @ Roboflow | Owner @ github.com/SkalskiP/make-sense (2.4k stars) | Blogger @ skalskip.medium.com/ (4.5k followers)