The article below was contributed by Timothy Malche, an assistant professor in the Department of Computer Applications at Manipal University Jaipur.

In this guide, we are going to walk through how to build an application that tracks for how long you have been reading a book using computer vision. This system works by detecting whether a book is open or closed. This project comprises a camera-based node equipped with a computer vision model and a JavaScript MQTT client subscriber.

Here is a demo of the application:

On the left is our application that detects whether the book is open or closed. On the right is the time tracking web application that counts for how long you have been reading a book.

The camera node captures video of users reading books on a study table and analyzes it using a computer vision model hosted on Roboflow. The model determines whether the book is opened or closed, and this information is then published to an MQTT broker via a Python script.

A JavaScript client subscribes to the MQTT broker, receiving messages indicating the book's status. Based on the book’s status message, it calculates the reading duration. This automated system allows users to effortlessly monitor their reading time and habits, facilitating effective progress tracking and analysis.

This project has several applications. For example, by monitoring reading time, users gain insights into their reading habits, including frequency, duration, and consistency. Users can track their progress over time, setting goals and benchmarks for reading sessions. Furthermore, the tracking system provides motivation and accountability, encouraging users to stick to their reading schedules and meet their targets.

Based on tracked data, users can create personalized reading plans tailored to their goals and preferences. Finally, by understanding their reading patterns, users can optimize their reading experience, leading to better comprehension and retention of material.

How the System Works

The camera captures video frames of the user reading a book on a study table. Each frame is sent to a computer vision model hosted on the Roboflow server for inference. The computer vision model analyzes the frames and detects whether the book is open or closed. Upon detection, the model sends the status (open or closed) to the Python script. The Python script receives the output of the computer vision model indicating the status of the book (opened or closed). When the book status changes, the script publishes messages to an MQTT broker indicating the status change.

The JavaScript code runs in a web browser and subscribes to the MQTT broker. It receives messages from the Python script indicating the book's status. When the book is detected as opened, it records the start time of reading. When the book is detected as closed, it calculates the reading duration by comparing the start and end times. The client updates the user interface to display the start time, end time, and total duration of book reading for each sitting. The following image show how system works.

A diagram of a computer system

Description automatically generated

Steps for Building the Project

Here are the steps to build the project:

  1. Collect and label a dataset of books
  2. Train an object detection model to detect books
  3. Run Inference to classify book reading status and send results
  4. Write a JavaScript application to calculate and display the book reading duration

Step #1: Collect and Label the Dataset

The first step in building our Book Reading Time Tracker is to collect a dataset of images. We need images of books in two states: “opened” and “closed”. You can use a camera or smartphone to capture these images. Aim for a diverse dataset with various angles, lighting conditions, and backgrounds. Following images show the two classes: opened and closed.

Book Dataset for “Opened” and “Closed” Classes

The images are then labelled using Roboflow’s labelling tool.

Dataset Labelling

Step #2: Train an object detection model

After labeling is completed, a version of the dataset is generated, and the model undergoes training using the Roboflow auto-training option. For training I choose Roboflow 3.0 -> Accurate -> Train from Public Checkpoint option as shown in following images.

Roboflow Train: Choosing Roboflow 3.0 Option
A screenshot of a computer

Description automatically generated
Roboflow Train: Choosing Accurate Option
A screenshot of a computer

Description automatically generated
Roboflow Train: Choosing Public Checkpoint Option

The trained model had a high accuracy:

Training Metrics

The following graph shows how the model was trained:

Training Graphs

Step #3: Run Inference to classify book reading status and send results

The following code detects the book status and publishes the result over HTTP. The code captures video from a webcam and performs object detection on each frame using the pre-trained model hosted on Roboflow's server that we made in the last step.

The detected objects class is published to an MQTT broker on the "book_status" topic. The script uses OpenCV for video processing, the Paho MQTT client for communication with the broker, and the InferenceHTTPClient from the inference_sdk for object detection. The loop continues until the user presses 'q' to quit, displaying the video frame with bounding boxes around detected objects.

import cv2
import time
import paho.mqtt.client as mqtt
from inference_sdk import InferenceHTTPClient

broker_address = "broker.hivemq.com"
port = 1883

client = mqtt.Client(client_id="book_reader_1")
client.connect(broker_address, port)

# Initialize InferenceHTTPClient
CLIENT = InferenceHTTPClient(
    api_url="https://detect.roboflow.com",
    api_key="ROBOFLOW_API_KEY"
)

video = cv2.VideoCapture(0)

while True:
    ret, frame = video.read()
    if not ret:
        break

    # Infer on the frame
    result = CLIENT.infer(frame, model_id="book-reading/1")
    detections = result['predictions']

    for bounding_box in detections:
        x0 = int(bounding_box['x'] - bounding_box['width'] / 2)
        x1 = int(bounding_box['x'] + bounding_box['width'] / 2)
        y0 = int(bounding_box['y'] - bounding_box['height'] / 2)
        y1 = int(bounding_box['y'] + bounding_box['height'] / 2)
        class_name = bounding_box['class']
        confidence = bounding_box['confidence']

        client.publish("book_status", class_name)  # Publish detected class to MQTT Server

        cv2.rectangle(frame, (x0, y0), (x1, y1), color=(0, 0, 255), thickness=1)
        cv2.putText(frame, f"{class_name} - {confidence:.2f}", (x0, y0 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    (0, 0, 255), 1)

    cv2.imshow('frame', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

video.release()
cv2.destroyAllWindows()

Step #4: Calculate and display the book reading duration with JavaScript

We are going to make a web page that tracks the status from our Python program. It will listen for MQTT messages indicating the status of the book (whether it's opened or closed) and calculate the reading duration accordingly. Here's how the logic works:

Message Handling

The MQTT client is set up to subscribe to the "book_status" topic. When a message is received on this topic, the onMessageArrived function is triggered.

Book Status Tracking

The onMessageArrived function checks if the book has been opened at least once (hasOpened flag).

If the book hasn't been opened yet and the received message indicates that it's "opened", it sets hasOpened to true. This ensures that the tracker only starts counting reading time when the book is first opened.

Reading Start and End Time

Once the book has been opened (hasOpened is true), the function tracks changes in the book's status. If the book is opened, it records the start time of reading (readingStartTime) when it receives the message "opened". If the book is closed, it calculates the reading duration if there was a previous reading start time.

The reading end time is calculated when the message "closed" is received.

If the book is closed without any previous reading start time, it simply logs the end of reading.

Reading Duration Calculation

The function calculateDuration calculates the duration between the reading start time and end time in hours and minutes.

This duration is then displayed on the web page.

In summary, the JavaScript code listens for MQTT messages indicating the status of the book, tracks when the book is opened and closed, calculates the reading duration, and displays the duration on the webpage. This allows users to monitor their reading time effectively. Following is the complete code.

<html>
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/paho-mqtt/1.0.1/mqttws31.min.js" type="text/javascript"></script>
<style>
  #status{
    background-color:#f5cff3;
    width:250px;
    height:150px;
    text-align: center;
    margin: auto;
    padding: 50px;
    box-shadow: rgba(6, 24, 44, 0.4) 0px 0px 0px 2px, rgba(6, 24, 44, 0.65) 0px 4px 6px -1px, rgba(255, 255, 255, 0.08) 0px 1px 0px inset;
  }
</style>
</head>
<body>
 <h1 style="text-align: center;">Book Reading Time Tracker</h1>

 <div id="status"><img src="book.png" style="width:50%">
 <h3 id="msg">NA</h3></div>

<script type="text/javascript">
// Create a client instance

client = new Paho.MQTT.Client("broker.hivemq.com", 8000 ,"br-client-001");

// set callback handlers
client.onConnectionLost = onConnectionLost;
client.onMessageArrived = onMessageArrived;

// connect the client
client.connect({onSuccess:onConnect});

// called when the client connects
function onConnect() {
  // Once a connection has been made, make a subscription and send a message.
 console.log("onConnect");
 client.subscribe("book_status");

}

// called when the client loses its connection
function onConnectionLost(responseObject) {
  if (responseObject.errorCode !== 0) {
    console.log("onConnectionLost:"+responseObject.errorMessage);
  }
}

let bookStatus = null; // Variable to store the status of the book
let readingStartTime = null; // Variable to store the start time of reading
let hasOpened = false; // Flag to track whether the book has been opened at least once

// Function to handle MQTT messages
function onMessageArrived(message) {
  const currentTime = new Date().toLocaleTimeString();
  
  if (!hasOpened && message.payloadString === "opened") {
    hasOpened = true;
  }

  if (hasOpened) {
    if (bookStatus !== message.payloadString) {
      bookStatus = message.payloadString;

      if (bookStatus === "opened") {
        readingStartTime = new Date();
        console.log("reading started at " + currentTime);
        document.getElementById("msg").innerHTML = "Reading Started at " + currentTime;
      } else if (bookStatus === "closed") {
        if (readingStartTime) {
          const readingEndTime = new Date();
          const duration = calculateDuration(readingStartTime, readingEndTime);
          console.log("reading ended at " + currentTime + ", duration: " + duration);
          document.getElementById("msg").innerHTML = "Reading Ended at " + currentTime + ", Duration: " + duration;
          readingStartTime = null; // Reset reading start time
        } else {
          console.log("reading ended at " + currentTime);
          document.getElementById("msg").innerHTML = "Reading Ended at " + currentTime;
        }
      }
    }
  }
}

// Function to calculate duration in hours and minutes
function calculateDuration(startTime, endTime) {
  const diff = endTime - startTime;
  const hours = Math.floor(diff / 3600000); // 1 hour = 3600000 ms
  const minutes = Math.floor((diff % 3600000) / 60000); // 1 minute = 60000 ms
  return hours + " hours and " + minutes + " minutes";
}

</script>
</body>
</html>

Here’s the final output of the project:

On the left is our application that detects whether the book is open or closed. On the right is the time tracking web application that counts for how long you have been reading a book.

Conclusion

In this blog post we have learnt how to build a Book Reading Time Tracker using Computer Vision and IoT. We discussed how to build the object detection model to detect whether the book is opened or closed while the user is reading the book and calculate the reading duration. The system that we built tracks reading time and provides valuable insights for users.

This project demonstrates the practical application of computer vision and IoT technologies in monitoring and enhancing reading habits. Using this application, users can monitor their reading time without manual intervention. The system provides accurate and automated tracking of book reading sessions, enabling users to analyze their reading habits and progress effectively.

Additional improvements could include storing reading sessions in a database and generating analytics based on the collected data.

All code for this project is available at GitHub. The dataset used for this project is available on Roboflow Universe.