Learning Outcomes 💫
By the end of this blog post, you will be able to…
- Know what features make a rectangle or polygon for computer vision applications
- Understand how to calculate distance between two points / objects in an image or video
- Convert pixel distance to real world measurements
- Conceptualize how estimating distance can be applied to other use cases
Using Object Detection for Measuring Distance
When a machine learning model detects an object, there is a lot of math that occurs in the background to make it happen. Concepts like gradient descent, backpropagation, activation functions, summing junctions, and more all compound into the ability to detect an object in an image.
While we don’t need to dive into the math of how we detect the object, we will briefly explore what types of inferences there are and how we can use that information to find distance.
Bounding Boxes for Distance Coordinates in Images
Probably the most prolific and known type of inference is object detection. Object detection uses bounding boxes which consist of information to draw a rectangle around the object of interest.
Some of this information includes the pixel width and height, class name as well as x and y coordinates. We can then use the size information along with the coordinates to determine where all four sides of the rectangle should be drawn.
Using JSON Outputs to Draw Bounding Boxes
To draw a rectangle using cv2.rectange(), you need the top-left corner and bottom-right corner of the rectangle.
This is a JSON example of an object detection response using Roboflow API:
We can then take this JSON information and convert it using Python into bounding box coordinates. With these coordinates, we can use OpenCV to draw a rectangle where an object is on the image, like so.
Instance Segmentation for More Precise Object Detection
Instance segmentation, an advanced method of segmentation, deals with locating instances of objects, defining their bounds, and assigning a class name to that object.
Whereas traditional segmentation focused exclusively on regionality of an object, instance segmentation combines features from object detection and classification to supercharge its inference capabilities.
Understanding a Polygon as an Array of X and Y Coordinates
A polygon is a closed, two-dimensional, flat or planar structure that is circumscribed by straight sides. There are no curves on its sides. The edges of a polygon are another name for its sides. A polygon's vertices (or points) are the places where two sides converge.
This is an example of an array of x and y coordinates from inference segmentation that makes up the points of a polygon:
In computer vision, we can use these points to draw the polygon by feeding it to a function like cv2.polylines() in python. This will take the points above and draw lines from each coordinate to the next until a fully closed polygon is built as displayed below.
Calculating Distance Between Points for Computer Vision
To calculate distance between two points we can use the Pythagorean Theorem to solve for any side of the polygon.
Often in computer vision problems you are solving for side c, because you are trying to calculate the distance between two object points that are on different horizontal and vertical planes.
By using the coordinates (x1, y1) representing the right most point on the triangle and coordinates (x2, y2) representing the top most point.
We can use a^2 + b^2 = c^2 to determine the distance between two points.
Pixel Distance Coding Example for a Triangle
Using the above triangle, let's see an example:
Using Computer Vision to Measure Distance in Images
In the real world, we don’t measure things using pixels so we need to convert this pixel distance into something humans can use.
Since I’m coding out of the USA, we will show the process for converting pixels to inches. You are more than welcome to use centimeters or any other measurement you would like. The only thing you need to know is the real world measurements of one of the objects you are trying to detect. Then you can use that ratio to measure other objects or lines you draw on the screen.
So how would we go about measuring the distance between these two cars? Here’s a link to the car detection dataset and model that we will use if you'd like to follow along.
Converting Pixel Distance to Inches
In our demo example using the triangle, the pixel distance was approximately 39px. In our real world case, after running inference and using the Pythagorean Theorem, we can determine that the current pixel distance is 479px from one box of the car to another.
Next we need to know the approximate size of at least one of the objects in the photo. Fortunately, using Roboflow’s inference API, we can determine that the Hyundai Elantra is approximately 1590px wide. We also know that the Kia Rio is approximately 1423px wide.
Now that we have the approximate pixel distance between the cars, and all the car’s pixel width, we will use the manufacturing documentation to conclude how wide the Hyundai Elantra and Kia Rio are in inches. The Elantra is about 179.1 inches and the Rio is 160 inches.
We can take the pixel distance of the cars and the inches distance of the same object to create a relatively accurate pixel to inches ratio.
1590px / 179.1 inches = 8.877px/inch
1423px / 160 inches = 8.893px/inch
Average Pixel Ratio = int((8.877+8.893) / 2)
Average Pixel Ratio = 8px/inch
Since we can’t measure or draw in between pixels, we have chosen to round down. Making the pixels_to_inches ratio 8px per inch. For your distance measuring use case, you may choose to round up.
Inference was run on a photo resolution of 4032 x 3024 pixels. The higher the resolution of the photo, the more accurate the distance measurement should be because the more pixels you have to work with.
Converting pixel distance to inches example:
Applying the Pixel Ratio Concept to Measure Between Bounding Boxes
Now that we know every 8px equals an inch, we can start to represent the distance between the cars in inches by using our pixel ratio.
479px / 8px = 59.875 inches
Cars at Approximately 5 Feet Apart
Testing the Pixel Ratio Concept at a New Distance
After successfully calculating the distance between the cars at 5 feet. Let's move the cars to an approximate distance of 10 feet and reapply the calculations.
The inference API tells us that the Elantra is approximately 1462px wide and the Rio is about 1209px wide. By dividing the px width by the known inches width of the cars, we can determine the pixel ratio once more.
1462px / 179.1 inches = 8.16px/inch
1209px / 160 inches = 7.556px/inch
Average Pixel Ratio = int((8.16+7.556) / 2)
Average Pixel Ratio = 7px/inch
Using the trigonometry from before, we can determine the distance between the boxes of the cars is approximately 756px.
756px / 7px = 108 inches
Cars at Approximately 10 Feet Apart
Measuring Distance with Computer Vision Summary
This solution showed that it is technically possible to use pixels to measure the distance between objects in a photo.
There are some nuances, like image resolution, to be aware of when using this in practice. You can think of pixels as lines on a ruler. The more pixels you have, the more lines you will be able to see on the ruler. Thus, the more pixels you have, the more likely the measurement will be accurate.
Another learning takeaway is bounding box tightness. Since our use case relies on using the current pixel width to calculate the pixel to inches ratio, we need the boxes to be tightly bound. Having loose bounding boxes can lead to the pixel ratio being over or underestimated, which will skew the results away from an accurate measurement. We recommend using labeling best practices to maintain high quality bounding boxes.
Lastly, for projects where the camera is stationary, you can be more confident that this would work as a solution for measuring distance. Not having to account for the z axis (camera position) means that it is more likely the pixel ratio will hold across measurements. The problem becomes more difficult if you try to measure objects at different camera positions or angles.
Thank you for reading and best of luck for those of you interested in applying this concept on your own projects. Drop any questions (or examples!) in the forum.