How to Use a Gaze Detection API

Did you know that your eye movements can be used as input for a computer, just like a mouse? Thanks to computer vision, controlling computers by looking in different directions is possible. Gaze detection can determine the direction in which a person is looking and can enable various applications.

Image recognition and other deep-learning methods can be used to analyze eye movements. Specifically, the direction and point of focus of a person’s gaze can be identified. For example, L2CS-Net, a state-of-the-art model, uses a convolutional neural network (CNN) to predict gaze angles (yaw and pitch) accurately separately. Specialized loss functions that combine classification and regression improve the model's ability to estimate gaze direction in diverse and unconstrained environments.

This is useful since gaze detection has many different applications. For example, you can use gaze detection to provide a means through which people with disability can use a computer without using a keyboard or mouse. You can also check if an online exam is conducted fairly by making sure the test-taker stays focused on the screen without looking at external material. 

Here is an example of a gaze detection API running:

In this article, we’ll explore the concept of using a gaze detection API to detect the direction a person is looking and discuss its importance in various use cases. We’ll also guide you through a coding example that shows how to use Roboflow’s Gaze Detection API on a video file. Let’s get started!

Understanding Gaze Detection APIs

Gaze detection APIs work by processing images or frames of video footage (of a person) using image processing algorithms to identify the different facial features and track the movement of the person’s eyes. After processing, the API can send back the details regarding where a person is looking. 

Such APIs have applications in research and development. For example, they can be used to track user attention on websites and help website developers understand which part or feature of the website is more engaging or interesting to the user. These APIs can be integrated into devices that help people with disabilities like paralysis communicate with the rest of the world.

In the next section, we’ll learn how to use Roboflow’s Gaze Detection API to process a video file and visualize the results.

Setting Up and Using the Local Roboflow Gaze Detection API

Let’s take a closer look at how you can use Roboflow’s API to get gaze detection results. We’ll walk you through the code step by step. To try using gaze detection, you’ll need a Roboflow API key and an input video file. You can refer to this documentation on how to retrieve an API key (make sure to open a Roboflow account first). We’ll be using the video file below as input.

Step #1: Install the Dependencies 

First, you’ll need to install all the packages needed. For this guide, we’ll be using the Docker version of Inference to run a server to retrieve predictions. If you do not have Docker installed on your system, follow the official Docker installation instructions. To install the packages, run the code given below using pip:

pip install inference inference-cli inference-sdk

Run this code to start the Docker inference server:

inference server start

Step #2: Initialize the API with the API Key

The code snippet given below will import the needed packages, initialize the API key, and gaze detection server URL to retrieve our predictions. Here we’ll also create a function to pass our video file for analysis (frame by frame) and return the predictions. Make sure to replace "Your API Key" with your actual API key.  

import base64
import cv2
import numpy as np
import requests

API_KEY = "Your API Key"
GAZE_DETECTION_URL = f"http://127.0.0.1:9001/gaze/gaze_detection?api_key={API_KEY}"

def detect_gazes(frame: np.ndarray):
    img_encode = cv2.imencode(".jpg", frame)[1]
    img_base64 = base64.b64encode(img_encode)
    resp = requests.post(
        GAZE_DETECTION_URL,
        json={"image": {"type": "base64", "value": img_base64.decode("utf-8")}},
    )
    return resp.json()[0]["predictions"]

Step #3: Visualize Gaze

Next, we’ll have to visualize the predictions. For that, we’ll create a function that draws bounding boxes around the face, an arrow pointing to the direction the person is looking, and key points in the face to detect the eyes and direction of the gaze. We’ll also draw the pitch and yaw labels on top of the bounding box.

def draw_gaze(img: np.ndarray, gaze: dict):
    # Draw face bounding box
    face = gaze["face"]
    x_min = int(face["x"] - face["width"] / 2)
    x_max = int(face["x"] + face["width"] / 2)
    y_min = int(face["y"] - face["height"] / 2)
    y_max = int(face["y"] + face["height"] / 2)
    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (255, 0, 0), 3)

    # Draw gaze arrow
    _, imgW = img.shape[:2]
    arrow_length = imgW / 2
    dx = -arrow_length * np.sin(gaze["yaw"]) * np.cos(gaze["pitch"])
    dy = -arrow_length * np.sin(gaze["pitch"])
    cv2.arrowedLine(
        img,
        (int(face["x"]), int(face["y"])),
        (int(face["x"] + dx), int(face["y"] + dy)),
        (0, 0, 255),
        2,
        cv2.LINE_AA,
        tipLength=0.18,
    )

    # Draw keypoints
    for keypoint in face["landmarks"]:
        x, y = int(keypoint["x"]), int(keypoint["y"])
        cv2.circle(img, (x, y), 2, (0, 255, 0), 2)

    # Draw label
    label = f"yaw {gaze['yaw']:.2f}  pitch {gaze['pitch']:.2f}"
    cv2.putText(img, label, (x_min, y_min - 10), cv2.FONT_HERSHEY_PLAIN, 3, (0, 255, 0), 3)

Step #4: Running Gaze Detection

Here, we’ll process the video file and pass it frame by frame to the above functions. Make sure to replace "Path / to / Video_File.mp4" with your actual video file path. We’ll also create a video writer function to write our output video file. It will be saved as "output_gaze_detection.avi". The code will also display each processed frame with visualized predictions for your analysis. 

if __name__ == "__main__":
    vid_path = "Path / to / Video_File.mp4"
    cap = cv2.VideoCapture(vid_path)
    assert cap.isOpened(), "Error reading video file"

    # Get video properties
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

    # Video writer setup
    video_writer = cv2.VideoWriter(
        "output_gaze_detection.avi",
        cv2.VideoWriter_fourcc(*'mp4v'),
        fps,
        (frame_width, frame_height)
    )

    while cap.isOpened():
        success, frame = cap.read()
        if not success:
            print("Video frame is empty or video processing has been successfully completed.")
            break

        gazes = detect_gazes(frame)
        if gazes:
            draw_gaze(frame, gazes[0])

        # Display and write the frame
        cv2.imshow("gaze", frame)
        video_writer.write(frame)

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    cap.release()
    video_writer.release()
    cv2.destroyAllWindows()

Here is the output video file with the direction of the person’s gaze marked by an arrow:

Here is another output that we got with a different input video file:

Challenges and Considerations 

Using APIs to detect the direction of someone's gaze is not always easy. The quality of the video plays a huge role. Higher resolution and frame rates can give more detailed information about facial features and the movement of the eyes.

Another factor to consider is the lighting conditions of the person in the video. The position of the user can also impact the accuracy of gaze detection; a poorly angled camera or a long distance can reduce the accuracy of the system.

Poor lighting can also make it difficult for the API to accurately detect facial landmarks and eyes. For example, if you’re considering using the Gaze Detection API to detect and track the eye movement of remote working employees using their webcam camera, the lighting must be good enough; otherwise, the detection may not work.

Good vs bad lighting conditions for gaze tracking through webcam. (Source)

Maintaining data privacy is also something to consider. Gaze detection systems collect a lot of data, particularly during research that requires monitoring user behavior. The data gathered can be very sensitive, containing information about a person’s interests, preferences, and habits. And let’s not forget their facial feature data. To protect such information, it is important that we implement robust data privacy measures to prevent misuse.

Applications and Use Cases

Now that we’ve understood how to run a gaze detection API and the challenges that may be involved, let’s take a more in-depth look at how this can be applied to different real-world applications.

User Experience Research

Gaze detection can be a powerful tool in UX research, and it offers invaluable insights into the user's behavior by observing where they focus more on a webpage, application, or game.

By understanding where they look, researchers can identify what elements are most intuitive and what people may miss in a digital experience.

Gaze detection and tracking are being used for better user experience while gaming. (Source)

Assistive Technology 

Another application of gaze detection and tracking technology is for assisting. Disabled users can use gaze detection systems combined with high-definition depth-sensing cameras to communicate and interact with the real world. Real-world interaction is possible by integrating gaze-tracking systems into robotics. Assistive robotic arms can be manipulated using such a system to move and manipulate any object.

A disabled person using gaze detection to navigate digital devices. (Source)

Conclusion 

Gaze detection technology offers exciting possibilities in various fields. It can help us better understand how people use technology, and enable people to use computers who cannot use input systems like keyboards and mice.

Developers can also easily add this technology to their apps and systems using APIs like the one provided by Roboflow. While there are challenges like video quality, lighting, and privacy to consider, the benefits are also substantial. As computer vision advances, gaze detection will likely play a key role in shaping how we interact with computers and the world around us.

Further Reading

Here are some resources for you to keep on exploring: