Ultimate Guide to Converting Bounding Boxes, Masks and Polygons
In this article, we will cover several valuable conversions between bounding boxes and polygon structures. Both bounding boxes and polygons are commonly used annotation formats in computer vision, but converting between them usually requires writing custom scripts. Using supervision, we will demonstrate an easy, simple way to complete your conversions.
What are Bounding Boxes, Polygons, and Masks?
Bounding boxes (xyxy) is the annotation format most commonly associated with computer vision. It’s used for object detection, where a model learns to label objects with boxes.
Polygon annotations are similar in the way that they are used for instance segmentation, where a model also learns to label objects, but rather with polygons (complex shapes) than boxes.
Masks are similar to polygons since they can show objects or regions on an image, but masks are a binary pixel representation of an image, with 1s for object/region pixels and 0s for background/unrelated pixels.
In this guide, we will show how to:
- How to Convert a Polygon to Bounding Boxes (xyxy)
- How to Convert a Polygon to a Mask
- How to Convert a Mask to Bounding Box (xyxy)
- How to Convert a Mask to a Polygon
Let’s begin!
Importing Your Detections Into Supervision
In this notebook, we will be using an open-source computer vision utility called Supervision. Supervision supports various import formats, like Inference, Ultralytics YOLOv8, and Azure Image Analysis.
For our example, we'll import our prediction results from Roboflow's hosted inference API:
prediction = model.predict(test_image_url,hosted=True).json()
detections = sv.Detections.from_inference(prediction)
The Detections object has the xyxy
(bounding box) and mask
properties, among others, that we will reference in this post.
How to Convert a Polygon to Bounding Boxes (xyxy to bbox)
Oftentimes, instance segmentation can be slower and more complex than object detection. In cases in which you don’t need the extra precision and detail of polygon detections, it might be best to annotate with polygons (read our blog post as to why) and then convert to bounding boxes for training object detection models.
Here’s how we can convert polygon data into bounding box data:
Method 1: Use the supervision.polygon_to_xyxy
utility
In this method, we use the polygon_to_mask
function to convert a raw array of polygon vertices into masks.
# Import Supervision
import supervision as sv
# Convert each polygon in the array of polygons to bounding boxes
bounding_boxes = [ sv.polygon_to_xyxy(p) for p in polygons ]
Our polygons array is a NumPy array for multiple polygons, which is why we iterate through the polygons array.
Method 2: Import into supervision and export from the xyxy property
First, we import supervision. Then, we import the polygon data that we’d like to convert.
In this example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.
We then export our bounding box data in [x1, y1, x2, y2] format from the xyxy property.
# Import Supervision
import supervision as sv
# Import polygon data
detections = sv.Detections.from_inference(prediction)
# Export as xyxy data
bounding_boxes = detections.xyxy
How to Convert a Polygon to a Mask
Polygon data, which is often used as both annotation formats and inference export formats can be useful data that can be converted into a mask, which can be used for training semantic segmentation datasets. Since polygons are a shape consisting of straight lines, masks can often be more useful for capturing the details of an object’s shape.
Here’s how we can convert polygon data into mask data:
Method 1: Use the supervision.polygon_to_mask
utility
In this method, we use the polygon_to_mask
function to convert a raw array of polygon vertices into masks.
# Import Supervision
import supervision as sv
# Convert each polygon in the array of polygons to masks
masks = [ sv.polygon_to_mask(p,(width,height)) for p in polygons ]
Our polygons array is an ndarray for multiple polygons, which is why we iterate through the polygons array.
Method 2: Import into supervision and export from the mask property
After importing supervision, we can import our detections from a source.
In the following example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.
Then, we can get the masks from the mask
property of the detections
object.
# Import Supervision
import supervision as sv
# Import polygon data
detections = sv.Detections.from_inference(prediction)
# Export from detections as a mask
masks = detections.mask
How to Convert a Mask to Bounding Box (mask to xyxy)
Semantic segmentation and instance segmentation models are generally slower than bounding box-based object detection models, so converting mask data to bounding boxes might be beneficial. Further, since masks contain pixel-level data, storing data in a bounding box format can have efficiency and storage benefits as well.
Here’s how we can convert mask data into bounding box data:
Method 1: Use the supervision.mask_to_xyxy
utility
In this method, we use the mask_to_xyxy
function to convert a mask into xyxy bounding box coordinates.
# Import Supervision
import supervision as sv
# Convert each polygon in the array of polygons to masks
bounding_boxes = sv.mask_to_xyxy(masks)
Method 2: Import detections into supervision and export from the mask property
After importing supervision, we import our detections from a source.
In this example, we use the inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.
Then, we can get the bounding boxes from the xyxy
property of the detections
object.
# Import Supervision
import supervision as sv
# Import mask data
detections = sv.Detections.from_inference(prediction)
# Export from detections as bounding box data
bounding_boxes = detections.xyxy
How to Convert a Mask to a Polygon
There are many situations in which you may want to convert a mask to a polygon. For instance, you may want to convert a binary mask used from a segmentation model to a polygon annotation in an automated labeling system.
Polygons are similar to masks in that they denote a specific area on a page. But, polygons are a list of coordinate points whereas a mask is an array equal to the size of an image, where each pixel is either part of or not part of the mask.
For a mask-to-polygon conversion, we use the supervision.mask_to_polygons()
function to convert our masks.
In this example, we use an inference result from Roboflow’s hosted inference API, but there are tons more import options. See all of them on the supervision docs.
If we have multiple masks, we will have to iterate through them, like we do in the example.
# Import Supervision
import supervision as sv
# Import mask data (optional if you have raw mask data)
detections = sv.Detections.from_inference(prediction)
# Convert each mask to a polygon
polygons = [ sv.mask_to_polygons(m) for m in detections.mask ]
# for raw mask data: polygons = sv.mask_to_polygons(mask)
Conclusion
That’s it! 🎉 In this guide, we covered a variety of useful conversions between bouncing boxes, masks and polygon data structures. Each task, which previously would’ve involved writing lengthy scripts can now be simplified into one or two lines of concise code.