What Is YOLO27?

Published Jun 10, 2026 • 4 min read

YOLO27, expected at YOLO Vision 2026 in Shenzhen, expands the YOLO family into 3D perception with two models: YOLO-Depth for monocular depth estimation from a single camera, and YOLO-StereoDepth for binocular disparity depth, positioned as a camera-native alternative to lidar. Earlier YOLO generations worked in two dimensions; depth adds how far away an object is, which matters for robotics and warehouse automation. Until YOLO27 ships, the post outlines alternatives: RF-DETR for detection and segmentation, Depth Anything V2 for monocular depth, and SAM 3 for segmentation.

YOLO models are a family of real-time computer vision models designed to handle a wide range of tasks, including object detection, segmentation, pose estimation, classification, and oriented object detection.

At YOLO Vision 2026 in Shenzhen, China, YOLO27 is expected to be announced. The next generation of the YOLO family, with a release planned for September 2026.

In this blog, we'll cover what YOLO27 is, what the new 3D perception models mean for real-world applications, what remains unknown ahead of release, and what you can use today to achieve even better results.

What Is YOLO27?

YOLO27 is the next generation of the YOLO model family, expected to be announced at YOLO Vision 2026 in Shenzhen. Where YOLO26 focused on edge optimization, NMS-free end-to-end inference, and faster CPU performance, YOLO27 expands the family in a new direction: 3D perception.

The announcement named two new models:

YOLO-Depth: monocular depth estimation from a single camera
YOLO-StereoDepth: binocular disparity depth for robotics, positioned as a camera-native alternative to lidar

Every prior YOLO generation worked in two dimensions, answering what is in the image and where it is on the image plane.

Depth estimation adds the third dimension: how far away each thing is. That is the difference between detecting a pallet and knowing a forklift can reach it, or detecting an obstacle and knowing a robot has three meters to stop.

YOLO-Depth: Monocular Depth Estimation

YOLO-Depth predicts depth from a single camera. Monocular depth estimation takes a standard 2D image and produces a depth map, an image where every pixel value corresponds to distance from the camera.

The appeal of monocular depth is hardware cost. A single RGB camera is the cheapest, most widely deployed sensor in the world. If a model can extract usable depth from cameras already installed on a production line, in a warehouse, or on a vehicle, teams get 3D understanding without new sensors or new capex.

The tradeoff, historically, is that monocular depth is relative rather than absolute. Models like Depth Anything V2 predict which pixels are closer and which are farther with impressive consistency, but converting that to real-world units requires calibration. Whether YOLO-Depth changes that tradeoff is an open question.

YOLO-StereoDepth: A Camera-Native Alternative to Lidar

YOLO-StereoDepth uses binocular disparity, the same principle as human vision. Two cameras at a known distance apart capture the same scene, and the model computes depth from the difference between the two views. Because the camera baseline is known, stereo depth produces absolute distances rather than relative ones.

YOLO-StereoDepth is a camera-native alternative to lidar for robotics. Lidar still holds advantages in low light, low-texture scenes, and long range, so camera-native depth is unlikely to replace it everywhere. But for cost-sensitive robotics, AMRs, and indoor automation, a strong stereo depth model running on commodity hardware would expand what is practical to build.

What We Don't Know Yet About YOLO27

As of this writing, we have not seen:

Benchmarks: no accuracy or latency numbers for YOLO-Depth or YOLO-StereoDepth, and no comparisons against existing depth models like Depth Anything V2
Model sizes: no confirmation of the Nano through Extra Large variant lineup used in previous generations
Task coverage: whether YOLO27 also updates the core 2D tasks (detection, segmentation, pose, classification, OBB) or whether the 3D models sit alongside YOLO26 for those
Licensing: YOLO27 licensing terms have not been announced. Previous similar releases shipped under AGPL-3.0, which requires open-sourcing derivative works unless you purchase a commercial license. If you are evaluating models for commercial deployment, this is worth confirming before you build on it.
A paper: There are not indicated plans for a formal research paper for YOLO27.

We will update this post as more information is released.

How to Use Depth Estimation Today

You do not need to wait for YOLO27 to add depth to a vision pipeline. Depth Anything V2, a state-of-the-art monocular depth estimation model, is available today as a block in Roboflow Workflows. You can chain it with detection and segmentation models to measure object distance from a camera, build depth-aware effects, or add spatial reasoning to robotics applications, all from a single RGB camera.

RF-DETR is faster and more accurate than YOLO26 for object detection and instance segmentation, and it ships with commercial-safe licensing. Pair it with Depth Anything V2 in a single Workflow for detection plus depth today.

YOLO27 Alternatives

While YOLO27 is not yet available, several models cover the same ground today and are actively benchmarked on the object detection leaderboard.

RF-DETR

RF-DETR, developed by Roboflow, is a family of real-time models supporting object detection, segmentation, and classification. RF-DETR outperforms YOLO26 across benchmarks and generalizes well across domains, and it is small enough to run on the edge using Inference. Core models (Nano through Large) and all code are released under the Apache 2.0 license. For teams choosing a detection or segmentation model right now, RF-DETR is the model we recommend.

Depth Anything V2

Depth Anything V2 is the current standard for monocular depth estimation, trained with a teacher-student pipeline across tens of millions of images. It generalizes well to real-world scenes without camera-specific calibration and runs in Roboflow Workflows today. When YOLO-Depth ships, this is the model its benchmarks will be measured against.

SAM 3

SAM 3 handles promptable segmentation across open-vocabulary inputs, useful when the objects you care about were not in your training set.

YOLO27 Conclusion

YOLO27 reflects where vision applications are heading: from understanding what is in a frame to understanding the physical space around a camera. That matters for robotics, logistics, manufacturing, and any system that needs to act in the real world, not just observe it.

Until release, the practical path to depth-aware applications is available now: RF-DETR for detection and segmentation, Depth Anything V2 for depth, combined in Roboflow Workflows and deployed to cloud, edge, or on-prem.

Cite this Post

Use the following entry to cite this post in your research:

Contributing Writer. (Jun 10, 2026). What Is YOLO27?. Roboflow Blog: https://blog.roboflow.com/what-is-yolo27/

Stay Connected

Get the Latest in Computer Vision First

Written by

Contributing Writer

View more posts

What Is YOLO27?

What Is YOLO27?

YOLO-Depth: Monocular Depth Estimation

YOLO-StereoDepth: A Camera-Native Alternative to Lidar

What We Don't Know Yet About YOLO27

How to Use Depth Estimation Today

YOLO27 Alternatives

RF-DETR

Depth Anything V2

SAM 3

YOLO27 Conclusion

Cite this Post

Written by

Topics

More About Computer Vision

Advanced Techniques for Optimizing AI Inference Costs

Pipe and Tubes Quality Inspection with Roboflow

Retail Object Detection with RF-DETR

Teaching a Porch to Recognize Delivery Drivers and Accept Packages

Cosmetic Defect Detection with Computer Vision

Multi-Model Auto Labeling for Segmentation with Roboflow Workflows