0:00
/0:21

Introduction

Computer vision is a useful tool when it comes to understanding and quantifying real-world activity happening in real-time. Tracking human movements with pose estimation is a common way to evaluate athletics or general body movement to help gain insight into proper form and technique. This guide will show you how to use keypoints or pose estimation models for building custom computer vision applications.

You’ll learn how to build your own computer vision model and effectively implement a computer vision workflow.

Before we start, let's break down the steps for the project.

The steps:

  • Create a custom vision model 
  • Create a workflow
  • Download needed libraries
  • Import Libraries
  • Get Keypoints
  • Get the Mediapipe detections
  • Add deployment code

Step 1. Create a Roboflow Model

First, sign up for Roboflow and create a free account.

Next, go to Workspaces and create a Project. Customize the project name and annotation group to your choice. Make sure to make a keypoint detection project.

Next, add your images. Use Roboflow Universe, the world's largest collection of open source computer vision datasets and APIs, to find a dataset if you don’t have time to gather your own data.

Then, add the classes you want your model to detect. Name the class then create the two points you need for your project. These are my two points. Left is the left side of the weight. Right is the right side.

Next, start annotating your dataset. We recommend getting at least 50 annotated images before training your first model.

Draw the annotations and repeat this step for each image. Make sure the key points are on each side of the weights. 

Lastly, generate a dataset version of your labeled images. Each version is unique and associated with a trained model so you can iterate on augmentation and data experiments.

Step 2. Create a workflow

Using the model we created, we can use Roboflow Workflows, the low-code tool for building computer vision pipelines. Workflows help streamline the application building process by making it easy to combine models and custom logic.

To start, navigate to the workflows tab on the dashboard and create a workflow.

Select the option to create a custom workflow. 

Next, select a keypoint detection model on the sidebar. 

Lastly, select the model you want to use. In my case, we will be using olympics-2/1.

Lastly, save the model and get the deployment code (both available at the top right of the screen).

Step 3. Download needed libraries

Now that we have the model, we can download helpful libraries. Make sure they are in the latest version to avoid any errors. Note that installing opencv may take a while due to wheel installation.

!pip install opencv-python numpy supervision inference mediapipe

Step 4. Import needed libraries

After downloading the libraries, we now need to import the ones we need.

import cv2
from inference.core.interfaces.camera.entities import VideoFrame
from inference import InferencePipeline
import supervision as sv
import mediapipe as mp
import numpy as np

Step 5. Get Model Keypoints

Using the model we previously created, we now need to extract useful data out of it. We can accomplish this through the following code snippet.

This function:

  • Takes the result of our model
  • Gets the detection information (position, class, confidence, etc.)
  • Returns the values using sv.KeyPoints
def from_workflows(result):
    # Not sure if this is what happens when no keypoints are detected.
    if "predictions" not in result:
        return sv.KeyPoints.empty()
    detections = result["predictions"]["predictions"]
    xy = detections.data["keypoints_xy"].astype(np.float32)
    class_id = detections.data["keypoints_class_id"].ravel().astype(np.int_)[: len(xy)]
    confidence = detections.data["keypoints_confidence"].astype(np.float32)
    return sv.KeyPoints(
        xy=xy,
        confidence=confidence,
        class_id=class_id,
    )

Step 6. Get Mediapipe Detections

Mediapipe will help us graph out the joints of the human. Through the recent release of Supervision 22.0, we can seamlessly integrate mediapipe detections onto our frame with little code.

The following code snippet:

  • Initiates the mediapipe model (different from the one we created) as well as the edge annotator
  • Gets the results from model
  • Graphs the joints on the frame using Supervision’s edge annotator
  • Gets the left hip and right hip values for future pose calculations
mp_pose = mp.solutions.pose
model = mp_pose.Pose()
edge_annotator2 = sv.EdgeAnnotator(color=sv.Color.BLACK, thickness=5)
def Get_Mediapipe(image):
    results = model.process(image)
    key_points = sv.KeyPoints.from_mediapipe(results, resolution_wh=image.shape[1::-1])
    image = edge_annotator2.annotate(
        scene=image, key_points=key_points
    )
    left_hip = results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_HIP]
    right_hip = results.pose_landmarks.landmark[mp_pose.PoseLandmark.RIGHT_HIP]
    left_hip.x = int(left_hip.x * 1920)
    left_hip.y = int(left_hip.y * 1080)
    right_hip.x = int(right_hip.x * 1920)
    right_hip.y = int(right_hip.y * 1080)
    

Step 7. Create Main Prediction Function

By creating one main function, we make our prediction code much more organized. With this step, we combine both the previous functions as well as additional logic. 

This code snippet:

  • Defines needed annotators 
  • Gets the image through the video frame
  • Calls both functions
  • Graphs the keypoints gotten from the previous function
  • Shows all the code with cv2.imshow
vertex_annotator1 = sv.VertexAnnotator(radius=8)
edge_annotator1 = sv.EdgeAnnotator(thickness=4, edges=[(0, 1)])
def on_prediction(res: dict, frame: VideoFrame) -> None:
    image = frame.image
    annotated_frame = image.copy()
    Get_Mediapipe(annotated_frame)
    keypoints = from_workflows(res)
    annotated_frame = edge_annotator1.annotate(
        scene=annotated_frame, key_points=keypoints
    )
    annotated_frame = vertex_annotator1.annotate(
        scene=annotated_frame, key_points=keypoints
     )
    # Show the annotated frame
    cv2.imshow("frame", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        return

Step 8. Create Main Prediction Function

Lastly, grab the previously obtained deployment code from Roboflow Workflows. 

By adding it and starting the pipeline, we have successfully finished the project. Your outputted video frames should look similar to this:

0:00
/0:21

Conclusion

In this guide, we were able to successfully deploy a Roboflow model in a Worklfow with Mediapipe integrations. We also utilized Workflows, a low-code tool to simplify the creation of computer vision applications.