Optical Character Recognition (OCR) is a problem type in computer vision that aims to identify and recognize characters (i.e. numbers, letters, punctuation) in an image. OCR has many applications across industries. For example, businesses can use OCR to read serial numbers on manufacturing lines for use in inventory management systems.
The best OCR systems do not always return exact results. With that said, we will end this article with a summary of how to approach error correction with OCR.
In this guide, we are going to talk about what OCR is, common use cases for OCR, and how to use a free OCR API. We will also discuss how to approach error correction with OCR.
By the end of this guide, we will be able to retrieve the shipping container numbers in this image (MSKU 0439215):
Without further ado, let’s get started!
What is Optical Character Recognition (OCR)?
Optical Character Recognition (OCR) is a method of using computers to read text on an image. In recent years, deep learning methods have been used for OCR, offering superior performance in many cases.
OCR Use Cases
A good way to think about OCR use cases is “when would I need to be able to read text with a computer?” Let’s talk about a few common use cases for OCR.
- Inventory management systems: Inventory systems can use OCR to track the serial numbers of parts as they are prepared for packaging, allowing real-time views on the status of parts in the system (i.e. has the part been sent to a customer?).
- Reading documents: OCR is commonly used to read documents, either written by hand or printed. This approach is taken to allow for extraction of the words on the page, which can then be copy-pasted and edited in a new digital document. OCR is commonly used for digitizing archives so they can be made digitally accessible and searchable.
How to Recognize Characters in an Image
Roboflow maintains a free OCR endpoint you can use to recognize characters in an image or video. The API is powered by DocTR, a machine learning-powered OCR model. The API allows you to retrieve the location of text in visual data. You can then retrieve the text in each location where text was found. You do not need any experience with computer vision to use this API.
The OCR endpoint is available for use in a hosted offering as well as on your device. The latter – running the model on your device – is useful if you need to run OCR in real time, or if you do not have access to a stable internet connection where you need to use OCR.
First, create a free Roboflow account. You can use your account to make 1,000 OCR API calls.
Next, open a terminal and install the `requests` dependency:
pip install requests
Let’s run an API that can identify both container numbers and ISO types first. For example, suppose we are building an application for the logistics industry.
We want to read the characters on a shipping container to take inventory of which containers are within a given facility, but not any extraneous characters (i.e. a logo). We can use a model to detect the regions of interest. Then, we can use OCR to read the attributes on the container.
Then, create a new Python file called app.py
and add the following code:
import requests
import base64
from roboflow import Roboflow
from PIL import Image
from io import BytesIO
import supervision as sv
API_KEY = ""
rf = Roboflow(api_key=API_KEY)
project = rf.workspace().project("container-shipping-number2")
model = project.version(3).model
bounding_boxes = model.predict("container1.jpeg").json()
predictions = sv.Detections.from_roboflow(bounding_boxes)
image = Image.open("container1.jpeg")
classes = [i["class"] for i in bounding_boxes["predictions"]]
for i, _ in enumerate(predictions.xyxy):
x0, y0, x1, y1 = predictions.xyxy[i]
class_name = classes[i]
# add 10% padding
x0 = int(x0 * 0.9)
y0 = int(y0 * 0.9)
x1 = int(x1 * 1.1)
y1 = int(y1 * 1.1)
cropped_image = image.copy().crop((x0, y0, x1, y1))
# change to black and white
cropped_image = cropped_image.convert("L")
# convert to base64
buffered = BytesIO()
cropped_image.save(buffered, format="JPEG")
data = {
"image": {
"type": "base64",
"value": base64.b64encode(buffered.getvalue()).decode("utf-8")
}
}
# decode and show image
img = Image.open(BytesIO(base64.b64decode(data["image"]["value"])))
img.show()
ocr_results = requests.post("https://infer.roboflow.com/doctr/ocr?api_key=" + API_KEY, json=data).json()
print(ocr_results, class_name)
In this code, we:
- Use a fine-tuned model to identify the container number in an image, then;
- Make a web request to the OCR endpoint to retrieve the text in the image.
You will need to replace two values in the code above:
image.jpg
: The name of the image on which you want to run OCR, and;API_KEY
: Your Roboflow API key. Learn how to retrieve your Roboflow API key.
Finally, run the Python script:
python app.py
Let’s run our script on the following image:
Our OCR script returns the following response:
{'result': '', 'time': 3.98263641900121} iso-type
{'result': 'MSKU 0439215', 'time': 3.870879542999319} container-number
Our model was able to successfully identify text in the image, but only for one class (the “container-number” class). To mitigate this issue, you can look at applying additional preprocessing steps appropriate for your project.
If the OCR model makes a mistake, we recommend that you: (i) ensure the text in the image is as readable as possible (i.e. the lighting conditions make the text visible, the typeface is easy to read), and; (ii) using an error correction system to address mistakes in the OCR endpoint.
We will talk about error correction later in this post.
You can also run the OCR API on a device through Roboflow Inference. Inference enables you to deploy computer vision models. You can deploy pre-trained models that use a supported model architecture (i.e. YOLOv5 and YOLOv8) and use foundation models such as DocTR. Inference runs a range of devices and architectures, from x86 CPU to ARM CPU to NVIDIA GPU.
For this guide, we are going to use the `inference` pip package. You can also run inference as a Docker container to which you can make web requests to run inference on images. To learn more about running Inference as a Docker container, refer to the Inference Docker documentation.
To install the package, run the following code:
pip install inference-cli
Then, start an inference server:
inference server start
This server will run at http://localhost:9001 by default.
Next, update the Python script from earlier to replace the INFER_SERVER_URL value from “http://infer.roboflow.com” to “http://localhost:9001”. This will ensure both the shipping container model and OCR model are run locally.
Replace the image.jpg
name with the name of the image on which you want to run inference. Replace the “API_KEY” placeholder with your Roboflow API key. Then, run the script.
The first time this script runs, the DocTR model will be downloaded onto your device. The time this will take depends on the strength of your internet connection. After the model weights have been downloaded, the weights will be cached locally for use in future inferences.
Once you have the model weights ready on your system, OCR will run on the provided image.
OCR Error Correction
OCR works well when text is clearly visible and in a typeface the OCR model can understand. You can test how well an OCR system works for your use case by running the system on a few images and evaluating the results.
With that said, OCR models do make mistakes, even when text is clearly readable. That is why systems that use OCR often employ error correction, a technique whereby you read the output returned by an OCR model and clean it up as necessary.
A common and effective approach to error correction is employing the use of heuristics to evaluate an OCR result. Consider a scenario where you know that the text you are reading is numbers. If OCR returns a “l” instead of a “1”, you can make the requisite correction.
If you know a document should contain one of three outputs in a particular location, you can take the output that is closest to the outputs you expect. Such logic is commonly implemented in code since you can define exactly what rules are met to correct errors. Read about how an insurance company applies OCR error correction on documents.
Alternatively, you can use a spelling correction algorithm. This is useful if you have written prose from which you want to extract text. Tools like SymSpell, of which there implementations in many programming languages, enable you to clean up OCR results.
You could use two different OCR models and develop logic to come to a probable answer. For example, one model may work well with text that is not straight, another may work well with text at an angle. You could run both models and choose the one that is correct, or apply fixes to both outputs to see if either of the model outputs can be parsed. You can also manually review OCR results.
Conclusion
OCR is a common task in computer vision. With OCR, you can identify the characters in an image. You can also identify the location of each unit of text (i.e. a word or a series of numbers). This location information can help you understand the structure of a document.
In this guide, we used the Roboflow hosted OCR API to retrieve the text in an image. This API uses the DocTR OCR model. We then used the `inference` pip package to run inference on an image locally. Finally, we discussed error correction in OCR.