CVPR 2023 Highlights

There was an electric feeling in the air at CVPR 2023 in Vancouver. Three members of the Roboflow team were in attendance, and Roboflow hosted a panel on the Roboflow 100 dataset.

The world's premier computer vision conference was packed full of researchers and practitioners sharing their ideas on the breakthroughs in the field over the last year and looking towards the future of the field and AI at large.

In this post, we will dissect the themes and highlights at CVPR 2023 which is both a reflection on the conference, but it is also a prediction on the major themes that will be dominating the computer vision landscape for the coming year.

The Rise of the Vision Transformer

In the world of AI research the transformer architecture has made major strides in pushing the state of the art. The vision transformer recently landed in the computer vision world. The vision transformer, built using the transformer architecture, treats a patch of pixels like a sequence of text, allowing the same architecture to be used for vision tasks.

At CVPR 2023, we saw a slew of new techniques related to the vision transformer with researchers working on analyzing its biases, pruning it, pretraining it, distilling it, reverse distilling it, and applying it to new tasks.

Some of our favorite papers in this category:

Computer Vision Yearns for a Foundational Model

General pre-trained models have shown to be wide multi-task learners, obviating the need for many, often much more tedious, fine-tuning approaches to machine learning problems. In NLP, language models predicting the next token in text have proven to be a foundational model that scales in efficacy with model size. In the computer vision research community, no such model and loss objective have emerged to serve as the de facto foundational model for CV tasks.

In Artificial Intelligence academia, there is often an attitude of "doing more with less" as we heard at the Tuesday and Wednesday keynotes. The academic research community recognizes that they will not be able to compete with industrial research labs that have access to vast computing resources to create their general models.

With that said, we saw numerous instances of research labs working on foundation models at CVPR, mostly at the intersection of language and images.

General pre-trained computer vision models that were heavily discussed at CVPR ranged in tactic and modality:

Grounding DINO: Zero shot object detection, multi-modal
SAM: Zero shot segmentation, image only
Multi-modal GPT-4 (not as much as we expected)
Florence: General task, multi-modal
OWL-VIT: Zero shot object detection, multi-modal

CLIP also boasted a long lineage of research papers at CVPR. Some exciting research working on foundational embedding models for computer vision at CVPR included:

Next year, there will inevitably be significant progress and focus on this front, and we can expect some exciting new foundational CV models to be released.

Machine Learning Technique, Tactics, and Tasks

While the conference hall was full of discussion about general models, the core body of CVPR research in 2023 involved more traditional work in techniques and tasks in computer vision.

Research advanced in tasks like NERFs, pose estimation, and tracking, with new approaches and routines.

General machine learning techniques advanced as well as researchers worked on the theory of machine learning and empirical results to improve training routines.

We were particularly excited about the following practical machine learning research:

Industry vs Research: A Notable Divide

A physical divide between the research poster sessions and company booths underpinned an intellectual divide between the future of the field and what is practical today.

While the research posters and workshop sessions were focused primarily vision transformers, the industry booths sported Python snippets wrapping YOLO models.

We were really excited to see significant industrial progress being made by companies working on data annotation services, cloud compute, and model acceleration.

Conclusion

It has never been a more exciting time to be working in computer vision. CVPR 2023 showcased many important moments from the year in our field. Multi-modal models promise a new foundation and practical progress is taking computer vision into a new phase of adoption in industry.

You can view the full list of CVPR 2023 research papers here: https://openaccess.thecvf.com/CVPR2023?day=all

Cite this Post

Use the following entry to cite this post in your research:

Jacob Solawetz. (Jun 23, 2023). CVPR 2023 Highlights. Roboflow Blog: https://blog.roboflow.com/cvpr-2023-highlights/

Stay Connected

Get the Latest in Computer Vision First

Topics

Model Training

CVPR 2023 Highlights

The Rise of the Vision Transformer

Computer Vision Yearns for a Foundational Model

Machine Learning Technique, Tactics, and Tasks

Industry vs Research: A Notable Divide

Conclusion

Cite this Post

Written by

Topics

More About

How to Train a ResNet-34 Model on a Custom Dataset

How to Train a ResNet-18 Model with a Custom Dataset

Launch: Roboflow Instant Models

How to Fine-Tune a SmolVLM2 Model on a Custom Dataset

Launch: Stop Training Jobs Early

Launch: Train Larger Models on Roboflow