Ultimate Guide to Converting Bounding Boxes, Masks and Polygons

Published Aug 15, 2023 • 5 min read

In this article, we will cover several valuable conversions between bounding boxes and polygon structures. Both bounding boxes and polygons are commonly used annotation formats in computer vision, but converting between them usually requires writing custom scripts. Using supervision, we will demonstrate an easy, simple way to complete your conversions.

What are Bounding Boxes, Polygons, and Masks?

Bounding boxes (xyxy) is the annotation format most commonly associated with computer vision. It’s used for object detection, where a model learns to label objects with boxes.

Polygon annotations are similar in the way that they are used for instance segmentation, where a model also learns to label objects, but rather with polygons (complex shapes) than boxes.

Masks are similar to polygons since they can show objects or regions on an image, but masks are a binary pixel representation of an image, with 1s for object/region pixels and 0s for background/unrelated pixels.

In this guide, we will show how to:

How to Convert a Polygon to Bounding Boxes (xyxy)
How to Convert a Polygon to a Mask
How to Convert a Mask to Bounding Box (xyxy)
How to Convert a Mask to a Polygon

Let’s begin!

💡

Want to try the code or see more technical details? Check out our accompanying Colab notebook!

Importing Your Detections Into Supervision

In this notebook, we will be using an open-source computer vision utility called Supervision. Supervision supports various import formats, like Inference, Ultralytics YOLOv8, and Azure Image Analysis.

For our example, we'll import our prediction results from Roboflow's hosted inference API:

prediction = model.predict(test_image_url,hosted=True).json()
detections = sv.Detections.from_inference(prediction)

💡

For more information and code examples for different import format, see the Supervision Detections API documentation.

The Detections object has the xyxy (bounding box) and mask properties, among others, that we will reference in this post.

How to Convert a Polygon to Bounding Boxes (xyxy to bbox)

Oftentimes, instance segmentation can be slower and more complex than object detection. In cases in which you don’t need the extra precision and detail of polygon detections, it might be best to annotate with polygons (read our blog post as to why) and then convert to bounding boxes for training object detection models.

Here’s how we can convert polygon data into bounding box data:

Method 1: Use the `supervision.polygon_to_xyxy` utility

In this method, we use the polygon_to_mask function to convert a raw array of polygon vertices into masks.

# Import Supervision
import supervision as sv

# Convert each polygon in the array of polygons to bounding boxes
bounding_boxes = [ sv.polygon_to_xyxy(p) for p in polygons ]

Our polygons array is a NumPy array for multiple polygons, which is why we iterate through the polygons array.

Method 2: Import into supervision and export from the xyxy property

First, we import supervision. Then, we import the polygon data that we’d like to convert.

In this example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.

We then export our bounding box data in [x1, y1, x2, y2] format from the xyxy property.

# Import Supervision
import supervision as sv

# Import polygon data
detections = sv.Detections.from_inference(prediction)

# Export as xyxy data
bounding_boxes = detections.xyxy

How to Convert a Polygon to a Mask

Polygon data, which is often used as both annotation formats and inference export formats can be useful data that can be converted into a mask, which can be used for training semantic segmentation datasets. Since polygons are a shape consisting of straight lines, masks can often be more useful for capturing the details of an object’s shape.

Here’s how we can convert polygon data into mask data:

Method 1: Use the `supervision.polygon_to_mask` utility

In this method, we use the polygon_to_mask function to convert a raw array of polygon vertices into masks.

# Import Supervision
import supervision as sv

# Convert each polygon in the array of polygons to masks
masks = [ sv.polygon_to_mask(p,(width,height)) for p in polygons ]

Our polygons array is an ndarray for multiple polygons, which is why we iterate through the polygons array.

Method 2: Import into supervision and export from the mask property

After importing supervision, we can import our detections from a source.

In the following example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.

Then, we can get the masks from the mask property of the detections object.

# Import Supervision
import supervision as sv

# Import polygon data
detections = sv.Detections.from_inference(prediction)

# Export from detections as a mask
masks = detections.mask

How to Convert a Mask to Bounding Box (mask to xyxy)

Semantic segmentation and instance segmentation models are generally slower than bounding box-based object detection models, so converting mask data to bounding boxes might be beneficial. Further, since masks contain pixel-level data, storing data in a bounding box format can have efficiency and storage benefits as well.

Here’s how we can convert mask data into bounding box data:

Method 1: Use the `supervision.mask_to_xyxy` utility

In this method, we use the mask_to_xyxy function to convert a mask into xyxy bounding box coordinates.

# Import Supervision
import supervision as sv

# Convert each polygon in the array of polygons to masks
bounding_boxes = sv.mask_to_xyxy(masks)

Method 2: Import detections into supervision and export from the mask property

After importing supervision, we import our detections from a source.

In this example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.

Then, we can get the bounding boxes from the xyxy property of the detections object.

# Import Supervision
import supervision as sv

# Import mask data
detections = sv.Detections.from_inference(prediction)

# Export from detections as bounding box data
bounding_boxes = detections.xyxy

How to Convert a Mask to a Polygon

There are many situations in which you may want to convert a mask to a polygon. For instance, you may want to convert a binary mask used from a segmentation model to a polygon annotation in an automated labeling system.

Polygons are similar to masks in that they denote a specific area on a page. But, polygons are a list of coordinate points whereas a mask is an array equal to the size of an image, where each pixel is either part of or not part of the mask.

For a mask-to-polygon conversion, we use the supervision.mask_to_polygons() function to convert our masks.

In this example, we use an inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.

If we have multiple masks, we will have to iterate through them, like we do in the example.

# Import Supervision
import supervision as sv

# Import mask data (optional if you have raw mask data)
detections = sv.Detections.from_inference(prediction)

# Convert each mask to a polygon
polygons = [ sv.mask_to_polygons(m) for m in detections.mask ]
# for raw mask data: polygons = sv.mask_to_polygons(mask)

Conclusion

That’s it! 🎉 In this guide, we covered a variety of useful conversions between bouncing boxes, masks and polygon data structures. Each task, which previously would’ve involved writing lengthy scripts can now be simplified into one or two lines of concise code.

Cite this Post

Use the following entry to cite this post in your research:

Leo Ueno. (Aug 15, 2023). Ultimate Guide to Converting Bounding Boxes, Masks and Polygons. Roboflow Blog: https://blog.roboflow.com/convert-bboxes-masks-polygons/

Stay Connected

Get the Latest in Computer Vision First

Written by

Leo Ueno

ML Growth Associate @ Roboflow | Sharing the magic of computer vision | leoueno.com

View more posts