What is Handwriting Recognition?
Handwriting recognition is the process of converting handwritten physical text into a digital format. Sometimes referred to as handwriting OCR, handwriting recognition (HWR), or handwriting text recognition (HTR), converting written text into a machine-readable format has speed and efficiency advantages that reduce the dependency on manual data entry.
OCR vs Handwriting Recognition
Optical character recognition, or OCR, is similar to handwriting recognition in that it works towards the same common goal of converting visual representations of text into digital ones. Yet, OCR has a uniquely different task than handwritten text recognition since OCR primarily focuses on printed text, which can often be easier to recognize and transcribe.
Handwriting Recognition Use Cases
Handwriting recognition technology is used in cases where human handwriting needs to be converted to a machine-readable format to process information. Let’s review some common use cases where handwriting recognition is used today.
Document Processing
From addresses on letters and mail, where handwriting recognition saved the USPS $90 million dollars in a year in processing costs, to processing handwritten checks or more advanced use cases processing forms and digitizing notes, handwriting recognition is common in many document processing systems.
Retail and Logistics
In the retail and logistics sectors, forms and invoices are often easier for employees than entering information into computers or mobile devices, but when it comes to information that needs to be stored, machine-readable text makes for an alternative that costs less to store and is easier to analyze and calculate with.
Education
Education has also seen the benefits of instantly transcribing handwritten information into digital text, where it has been used to digitize historical documents for research, scan lecture notes for accessibility, and transcribe written problems.
Challenges For Handwriting Recognition
OCR and handwriting recognition have similar histories rooted in rudimentary pattern recognition systems. Although advancements in OCR benefited handwriting recognition and vice versa, the incredible variance of human handwriting in style and neatness created challenges for identifying consistent patterns. However, with the advancement of deep learning and machine learning strategies, as well as the incorporation of transformers in newer OCR and handwriting recognition implementations, deep learning-based models are primarily the state of the art for handwriting recognition.
How to Use Handwriting Recognition
There is a wide range of options to pick from for your handwriting recognition solution, from multimodal large language models (LMMs) to cloud API providers to locally-run packages or GitHub projects, or creating your own solution with a custom dataset. With many options, you can build the application that makes sense for your use case. Let's dive into these options so you can understand what might be best for your use case..
Handwriting Recognition with Large Multimodal Models
Although large multimodal models do not specifically advertise their handwriting recognition abilities, similar to their impressive performance in OCR tasks, LMMs like OpenAI’s GPT-4 with Vision, Anthropic’s Claude 3, and Google Gemini have all shown the ability to perform HTR tasks.
Handwriting Recognition with Cloud API Providers
Aside from LMMs, there are plenty of API providers that do handwriting recognition as a service. Some of these examples include Amazon Web Services Textract, Google Document AI, Microsoft Azure’s Cognitive Services, Pen2Txt, and Rossum.
Open Source Handwriting Recognition Packages & GitHub Projects
While LMMs and APIs do provide good solutions, running handwriting recognition locally on-device can eliminate the per-use or monthly costs of utilizing a hosted service, as well as having the benefit of using it locally without an internet connection. Some packages and GitHub projects that have shown promise in handwriting recognition include TrOCR, SimpleHTR, and Laia.
Handwriting Datasets
Through competitions like the International Conference on Document Analysis and Recognition (ICDAR), as well as other endeavors into HTR, there are quite a few datasets available:
- ICDAR 2023 Competition on Recognition of Multi-line Handwritten Mathematical Expressions
- International Association for Pattern Recognition Te4chnical Committee 11 - Dataset List
- Handwriting Datasets
- Handwritten Forms Datasets
- Handwritten Math Equations Datasets
Handwriting Recognition Tutorial
Now that we have reviewed what handwriting recognition can be used for and what options we have for using it, we will go over an example use case: Extracting information from bank checks. We will use an example image.
Running TrOCR on the entire image, a strange output resulted: `’1903’`. This reveals a problem with most handwriting recognition solutions, they only have the ability to extract text and sometimes treat the entire image as localized text.
To solve this problem, we use this bank check extraction model and run a prediction on it.
from inference_sdk import InferenceHTTPClient
from google.colab import userdata
CLIENT = InferenceHTTPClient(
api_url="https://detect.roboflow.com",
api_key="*ROBOFLOW_API_KEY*"
)
result = CLIENT.infer(image, model_id="chequemodel/1")
Once we run our prediction, we can crop then run TrOCR on the cropped selections:
# Crop images
class_list = detections.data["class_name"]
name_detection = detections[class_list == "Payee_Name"]
name_image = sv.crop_image(image,name_detection.xyxy[0].tolist())
amount_detection = detections[class_list == "Amount_In_Numbers"]
amount_image = sv.crop_image(image,amount_detection.xyxy[0].tolist())
# Run OCR
name_text = run_trocr(name_image)
amount_text = run_trocr(amount_image)
print("Name:",name_text)
print("Amount:",amount_text)
Resulting in a correct extraction of the name and amount:
This process could be adapted to any use case using different object detection models like for forms or by creating your own model with your data.
Conclusion
In this guide, we reviewed the field of handwriting recognition and what it can be used for and what options exist for using it, as well as covering potential options for using handwriting recognition from multimodal models to API providers and open-source packages and projects. We also reviewed an example of how using object detection can be used alongside to create a comprehensive handwriting recognition system.