How to Build Gaze Control into Mobile Games

Ever wanted to control a game just by moving your head? In this tutorial, I'll show you how to create a gaze controlled Tetris game using Expo, React Native and Roboflow Workflows.

0:00

/0:10

Want to try it before continuing? You can download it in the app store (iOS only for now).

What we're building

A Tetris game with two control modes:

Classic touch controls
Gaze controls (look to the left/right to move, up to rotate, down for soft drop)

The game - what do we need?

To build such a game, we will need to set up:

A mobile development environment.
The computer vision logic to run in the camera feed and help us control the actions.
The game event loop to catch player's actions and react to them.
The game logic itself - in this case, Tetris.

For that we will be using Expo + React Native to develop and test our mobile app, RxJS to handle the game loop and Roboflow Workflows to build and deploy the computer vision app.

Here is a high-level diagram of the final architecture for the game.

Architecture overview of the whole system.

Creating the mobile app development environment

To set up the mobile development environment we will use Expo. You can check the documentation on how to do that here. Expo will allow us to develop, test and deploy apps using React Native for multiple platforms - including Android and iOS.

Building and deploying the computer vision app

The problem we want to solve with computer vision is: given a frontal phone camera feed, how can we read a person's face to interpret the gaze into controller actions?

The Workflow

We can easily do that by using Workflows - it turns our model inferences and application logic into a single API call, removing all the complexity from the application layer. This is our Workflow:

Workflow to infer against a Gaze Detection model and then process the action.

It has two steps: the Gaze Detection - which will try to infer where the person is looking at and return angles like yaw and pitch - and the Controller that's going to interpret these values against thresholds and return the command given: if the person is looking to their left, right, up or down.

For the Controller logic, we will use the flexibility of a Dynamic Python Block to write code that gets the values of yaw and pitch and compares them to give a face direction output.

Building the Dynamic Python Block (Gaze Tetris Controller)

This is an example of how we can benefit of having custom python code running in our computer vision workflow. It creates flexibility so we don't need to bring computer vision logic inside our React application.

At the end, we will have an endpoint that receives an image and returns whether the intended action is left, right, up or down.

# Function definition will be auto-generated based on inputs
# Expected function output: {"action": None}

def safe_get_first_list_value(my_list, default_value):
    if not isinstance(my_list, list):
        return my_list if my_list is not None else default_value
    if len(my_list) == 0:
        return default_value
    return my_list[0]

def run(self, yaw_degrees, pitch_degrees, left_threshold, right_threshold, up_threshold, down_threshold) -> BlockResult:
    # yaw means left and right
    # pitch means up and down
    # yaw and pitch come as arrays, but we only need the first value
    yaw = safe_get_first_list_value(yaw_degrees, 0)
    pitch = safe_get_first_list_value(pitch_degrees, 0) + 10 # need to correct because people often look down to their phones
    
    right_score = 0
    left_score = 0
    up_score = 0
    down_score = 0
    
    if yaw > 0 and yaw > right_threshold:
        right_score += yaw - right_threshold
    elif yaw < 0 and yaw < -left_threshold:
        left_score += abs(left_threshold + yaw)
    elif pitch > 0 and pitch > up_threshold:
        up_score += pitch - up_threshold
    elif pitch < 0 and pitch < -down_threshold:
        down_score += abs(down_threshold - pitch)
        
    # normalize scores relative to the threshold
    right_score = right_score / right_threshold
    left_score = left_score / left_threshold
    up_score = up_score / up_threshold
    down_score = down_score / down_threshold
        
    if right_score == left_score == up_score == down_score:
        return { "action": "center"}
        
    # returns the action with the highest score
    if right_score > left_score and right_score > up_score and right_score > down_score:
        return { "action":"looking_right"}
    elif left_score > right_score and left_score > up_score and left_score > down_score:
        return { "action":"looking_left"}
    elif up_score > right_score and up_score > left_score and up_score > down_score:
        return { "action":"looking_up"}
    elif down_score > right_score and down_score > left_score and down_score > up_score:
        return { "action":"looking_down"}
    return { "action": "center"}

This is how our dynamic python block looks in the UI.

The application

The application itself has two routes: the home screen with a "Play" button and the game tab. I did spend some time making those components look better, adding some styling and animations.

The home screen

The home screen should be just a container with the game title and a play button. Here's a look at how it works:

import Title from "./components/AnimatedTitle";
import PlayButton from "./components/AnimatedPlayButton";

export default HomeScreen() {
  return (
    <View>
      <Title />
      <PlayButton />
    </View>
  )
}

In order to use the phone's camera, we need to ask for the user permission on the component mount. We can do that in the following way:

import { useEffect } from "react"
import { useCameraPermissions } from "expo-camera";

export default function HomeScreen() {
  const [permission, requestPermission] = useCameraPermissions();

  useEffect(() => {
    // Request camera permission on mount, if not granted
    if (!permission?.granted) {
      requestPermission();
    }
    // ... the rest of our component
}

So the PlayButton component pushes to the game route!

The game screen

The main game component combines two elements:

Game state management using React's useReducer hook
A reactive game loop powered by RxJs

Here's a look at how it works:

// 1. Game State Management
const [gameState, dispatch] = useReducer(gameReducer, initialState);

// 2. RxJS Game Loop
useEffect(() => {
  const gameLoop$ = interval(dropSpeed).pipe(
    takeUntil(gameOver$),
    filter(() => !gameState.isGameOver)
  );

  const subscription = gameLoop$.subscribe(() => {
    dispatch({ type: "MOVE_DOWN" });
  });
});

// 3. Control System
return (
  <View style={styles.container}>
    <TetrisBoard gameState={gameState} />
    
    {/* Show face controls or touch controls based on mode */}
    {showFacialControls ? (
      <CameraPreview />
    ) : (
      <TouchControls dispatch={dispatch} />
    )}
  </View>
);

The magic happens in how these pieces work together:

The game loop constantly moves pieces down
Player actions dispatch state updates
The board re-renders smoothly with each state change

To catch the player actions and dispatch them to update the game loop, we need to look into the camera feed and interact with it. This is done in the CameraPreview component. It handles three tasks:

Captures camera frames
Sends them to our Workflow API
Converts face movements into game actions

Here's a look at how it works:

function CameraPreview() {
  // Capture frames every 100ms
  useEffect(() => {
    const subscription = interval(100).pipe(
      switchMap(async () => {
        // 1. Capture frame from camera
        const photo = await camera.takePictureAsync();
        
        // 2. Send to workflow API
        const response = await axios.post(WORKFLOW_URL, {
          image: photo.base64
        });
        
        // 3. Convert API response to game action
        const gameAction = mapFaceDirectionToGameAction({
          action: response.data.outputs[0].action,
        });
        
        // 4. Send action to game
        if (gameAction) {
          gameActionSubject.next(gameAction);
        }
      })
    ).subscribe();
  }, []);

  return (
    <View>
      <CameraView ref={cameraRef} />
    </View>
  );
}

This gives us a translator that gets the image input from the camera, calls an API, and sends an event to the game loop with the corresponding control.

Running everything together

After putting some animations and styling into the screens and wrapping everything up, here's the repository for the full app: https://github.com/joaomarcoscrs/face-tetris

Since the workflow has a Dynamic Python Block step, we need to run it against our own local inference server and you can check the documentation on how to deploy an inference server locally.

Once you're in the main repository folder, just run

npx expo install && npm start

Expo generates a QRCode you can read to run the app in your device.

If you want to give it a shot before running everything together, here's the link for the iOS app in the app store.

After that, enjoy the game!

Cite this Post

Use the following entry to cite this post in your research:

Joao Marcos Cardoso Ramos da Silva. (Dec 18, 2024). How to Build Gaze Control into Mobile Games. Roboflow Blog: https://blog.roboflow.com/build-gaze-control-mobile-game/

Stay Connected

Get the Latest in Computer Vision First

Topics

Computer Vision

How to Build Gaze Control into Mobile Games

What we're building

The game - what do we need?

Creating the mobile app development environment

Building and deploying the computer vision app

The Workflow

Building the Dynamic Python Block (Gaze Tetris Controller)

The application

The home screen

The game screen

Running everything together

Cite this Post

Written by

Topics

More About

How Computer Vision Is Reshaping The Restaurant Industry

Use Gemini 2.5 for Zero-Shot Object Detection & Segmentation

How to Make a Heatmap with Computer Vision

AI in Robotics

Comprehensive Guide to Multiple Object Tracking

Comparing Base and Fine-Tuned SmolVLM2 for OCR