In work environments where employees are around moving equipment – from vehicles to automated systems – it is essential that all safety rules and regulations are followed.
One potential use of computer vision is to identify when people enter a restricted zone, which could be used to monitor entry into a zone and count the number of people present to ensure the zone does not get too crowded.
In this tutorial, we will cover how to create your own real-time person detection model as well as add zone monitoring abilities to the system.
To build this application, we will follow these steps:
- Train a person detection model
- Install and import libraries
- Define a zone of interest from a reference image
- Define color annotators
- Write logic to monitor when people are in a zone
- Test our program
Create a Model
To get started, create a free Roboflow account. Then, click “Create Project” to create a new project. Set a name for your project and choose the “Object Detection” project type:
Next, add your images. The images I used are downloadable through this link. Make sure to download the dataset and have the files saved somewhere.
Add the downloaded images to your dataset and continue:
Then, add the classes you want your model to detect. For our use case, we only need one class: person.
Now that we have our annotations and images, we can generate a dataset version of your labeled images. Each version is unique and associated with a trained model so you can iterate on augmentation and data experiments.
Install and Import Libraries
First, install the required libraries. To do this, we need to run the following code:
!pip install supervision inference
Next, create a new Python file and import the following libraries into your script:
import supervision as sv
import cv2
from typing import Union, List, Optional
from inference.core.interfaces.camera.entities import VideoFrame
from inference import InferencePipeline
import numpy as np
Create Zone From Image
To track when people enter and exit a zone, we need to define exactly what zone we want to track. Using Polygon Zone, we can drag and drop an image and create our preferred zone.
Open Polygon Zone, drag the image you want to use into the editor, then click to draw a polygon around the area you want to track:
Copy the NumPy points into your program:
zone = np.array(
[
[426, 228],
[358, 393],
[367, 434],
[392, 464],
[427, 486],
[479, 492],
[533, 504],
[895, 511],
[872, 243],
[429, 226],
]
)
In order to display the location of predictions from our model, we need to use annotators. Supervision is an all-in one computer vision library, which has the exact annotation tools we need to show humans. Using Supervision, we can add detection features to the project with the following code snippet.
COLOR_ANNOTATOR = sv.ColorAnnotator()
LABEL_ANNOTATOR = sv.LabelAnnotator()
Create Zone Logic
We are now ready to define logic that tracks when people enter and exit a zone. For this, we will use the PolygonZone functionality in supervision, an open source Python package with utilities for working with computer vision models.
Here is the code we need:
def zone_logic(zone, detections, frame):
polyzone = sv.PolygonZone(
polygon=zone,
)
zone_annotated = sv.PolygonZoneAnnotator(
zone=polyzone,
color=sv.Color.RED,
thickness=5,
)
people_in_box = 0
zone_presence = polyzone.trigger(detections)
zone_present_idxs = [idx for idx, present in enumerate(zone_presence) if present]
for detection in zone_present_idxs:
people_in_box += 1
annotated_frame = zone_annotated.annotate(
scene=frame, label=f"People inside Zone: {people_in_box}"
)
Now, we can finally begin to create the logic behind the counting. First, we set the amount of people in the box as 0. Next, we trigger zone_presence which will detect how many detected humans are in the zone. Using this number, we imply a simple for loop that adds to the people_in_the_box variable. Lastly, we use the zone annotator from Supervision to show how many people are inside the zone.
Finally, we need to define a function that lets us run inference with our model. This function should take in a dictionary as a prediction (the format from Roboflow models) and a VideoFrame as the video frame.
Here is the code we need:
def on_prediction(
predictions: Union[dict, List[Optional[dict]]],
video_frame: Union[VideoFrame, List[Optional[VideoFrame]]],
) -> None:
for prediction, frame in zip(predictions, video_frame):
if prediction is None:
continue
image = frame.image
detections = sv.Detections.from_inference(prediction)
annotated_frame = image
annotated_frame = COLOR_ANNOTATOR.annotate(
scene=annotated_frame, detections=detections
)
annotated_frame = LABEL_ANNOTATOR.annotate(
scene=annotated_frame,
detections=detections,
)
zone_logic(zone, detections, annotated_frame)
cv2.imshow("frame", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
Break
In this function, we loop through each prediction and grab the frame.
After getting the image, we can use the detections gotten from the predictions to annotate the image. We use the previously defined COLOR and LABEL Annotators in order to do so.
Next, we call the zone logic function and show the annotated frame.
Finally, we connect all of this together by calling our model made on Roboflow. Use the following code snippet to call the code. make sure to replace the nessessary information with your own info.
pipeline = InferencePipeline.init(
video_reference="VIDEO",
model_id="MODEL_ID",
max_fps = 60,
confidence=CONFIDENCE,
api_key="API_KEY",
on_prediction=on_prediction,
)
pipeline.start()
Conclusion
In this guide, we learned how to create a real-time person detection model as well as leverage the model for zone monitoring tasks. Using Roboflow, we were able to create our own successful model and deploy it using an Inference Pipeline. For more similar blogs and tutorials, visit Roboflow Blogs.