How to Use EasyOCR
Published Oct 16, 2024 • 7 min read

EasyOCR is a Python package for detecting and extracting text from images such as photos or scanned documents. It comes with pre-trained models designed to make text recognition fast and efficient and supports over 80 languages.

In this article, we’ll explore how to use the EasyOCR package to detect and extract text from images. Let’s get started!

How Does EasyOCR Work

EasyOCR’s internal text recognition process involves three main steps: feature extraction, sequence labeling, and decoding.

eep learning models like ResNet and VGG (Visual Geometry Group) extract key features from images, LSTM (Long Short-Term Memory) networks handle the sequential context, and the CTC (Connectionist Temporal Classification) algorithm converts these labeled sequences into readable text.

An Example of Using EasyOCR. (Source)

How To Use EasyOCR to Extract Text From Images

Now that we have taken a look at what EasyOCR is and some of its use cases, next, we’ll explore a coding example that showcases how to use EasyOCR to extract text from an image.

Step #1: Installation

To get started, install all the necessary libraries required. We will be installing the PyTorch library (which is a core dependency) and the EasyOCR library, using the ‘pip’ package installer. Open a command prompt or terminal and run the following commands to start installations.

pip install torch torchvision torchaudio easyocr

Step #2: Initializing the Reader

After installing the dependencies, we can import the EasyOCR package and initialize its ‘reader’ function. We can also select the language we want to detect and extract. There are over 80 languages to choose from. For this example, let’s choose English as ‘en.’ 

import easyocr
reader = easyocr.Reader(['en'])

In the ‘reader’ function, we can also manage system specification settings like enabling or disabling GPU or selecting a custom directory to store the EasyOCR models. For the EasyOCR library, GPU is recommended for faster processing, but it can be disabled to work on your CPU as well. To disable the GPU setting (it is set to ‘True’ by default):

reader = easyocr.Reader(['en'],gpu=False)

To choose a custom directory for model storage:

reader = easyocr.Reader(['en'],model_storage_directory='path/to/directory'
)

Step #3: Running EasyOCR on an Image

Once the reader function is initialized, the models will be automatically downloaded (to a custom directory, if chosen). Next, we can initialize the ‘readtext’ function and pass the path of an input image to it. 

result = reader.readtext('/path/of/image')
print(result)

In this example, we will use an image that contains the serial and part numbers of a product as input. You can also use the same image or any other relevant image.

The Input Image. (Source)

When we print out the results, it will include the bounding box coordinates of the detected text within the image, the detected text, and the confidence scores. 

Here’s the output returned for our input image:

[([[28, 22], [353, 22], [353, 72], [28, 72]], 'SERIAL NUMBER', 0.8874381662616708), ([[35, 75], [397, 75], [397, 137], [35, 137]], 'AOC1715821', 0.8521895819573561), ([[39, 255], [315, 255], [315, 299], [39, 299]], 'PART NUMBER', 0.9971079202290591), ([[42, 298], [370, 298], [370, 354], [42, 354]], '9-00864-01', 0.8142346378327698)]

The bounding box coordinates are shown first, followed by the extracted text, and finally, the confidence score. But ideally, we don’t need all this data; we only need the extracted text. EasyOCR allows for easy output customization options. Next, we’ll focus on customizing and visualizing the output.

Step #4: Configure Output Format

EasyOCR library offers many different output customization options. We can avoid retrieving bounding box coordinates and confidence scores by setting the ‘detail’ parameter to zero within the ‘readtext’ function as shown below.

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('/path/of/image',detail=0) 
print(result)

For our input image, this would be the result that is printed. As you can see, we have filtered out only the detected text. 

['SERIAL NUMBER', 'AOC1715821', 'PART NUMBER', '9-00864-01']

We can also group the text together by setting ‘paragraph’ to true, as shown below. 

result = reader.readtext('/path/of/image',detail=0, paragraph=True) 
print(result)

This would be the printed result for our input image. The extracted texts are now grouped together.

['SERIAL NUMBER AOC1715821', 'PART NUMBER 9-00864-01']

Here’s an example of a code snippet to print out the results line by line:

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('/path/of/image')
for res in result:
  coord=res[0]
  text=res[1]
  conf=res[2]
  print(text)

This would be the printed result for our input image. The extracted texts are printed out line by line in order of appearance in the document.

SERIAL NUMBER
AOC1715821
PART NUMBER
9-00864-01

If instead of printing the variable ‘text,’ we printed the variable ‘coord’ and ‘conf,’ we would get similar outputs with the bounding box coordinates and confidence scores.

Step #5: Visualize Results

We can also visualize the predictions of the EasyOCR module overlaid on top of the input image. This can be easily done using the Supervision Python package. Supervision provides a range of reusable computer vision tools for tasks like annotating predictions generated by various computer vision models.

In order to use the library, we need to install the Supervision Python package as shown below:

pip install supervision

After installing the module, this code uses EasyOCR to detect text in an image and annotate it with bounding boxes and labels. It initializes the EasyOCR reader for English, processes the image to extract text, bounding box coordinates, and confidence scores, and stores the data in lists.

Supervision’s annotators are then used to overlay bounding boxes and text on the image. Finally, the annotated image is displayed and saved as "Output.jpg," providing a complete workflow for OCR and visual annotation.

import easyocr
import supervision as sv
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

# Image path
Image_path = '/path/to/image'

# Initialize EasyOCR reader (English language, CPU)
reader = easyocr.Reader(['en'], gpu=False, model_storage_directory='/path/to/directory')

# Perform text detection on the image
result = reader.readtext(Image_path)

# Load image using OpenCV
image = cv2.imread(Image_path)

# Prepare lists for bounding boxes, confidences, class IDs, and labels
xyxy, confidences, class_ids, label = [], [], [], []

# Extract data from OCR result
for detection in result:
    bbox, text, confidence = detection[0], detection[1], detection[2]
   
    # Convert bounding box format
    x_min = int(min([point[0] for point in bbox]))
    y_min = int(min([point[1] for point in bbox]))
    x_max = int(max([point[0] for point in bbox]))
    y_max = int(max([point[1] for point in bbox]))
   
    # Append data to lists
    xyxy.append([x_min, y_min, x_max, y_max])
    label.append(text)
    confidences.append(confidence)
    class_ids.append(0)  

# Convert to NumPy arrays
detections = sv.Detections(
    xyxy=np.array(xyxy),
    confidence=np.array(confidences),
    class_id=np.array(class_ids)
)

# Annotate image with bounding boxes and labels
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

annotated_image = box_annotator.annotate(scene=image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections, labels=label)

# Display and save the annotated image
sv.plot_image(image=annotated_image)
cv2.imwrite("Output.jpg", annotated_image)

Here’s what the output image will look like:

Here’s another example output image obtained using a different input image:

OCR Challenges and Considerations

Using OCR tools like EasyOCR to detect and extract text from images is not always easy. To get the best results, it’s important to consider several key factors. The quality of the image is crucial, it needs to be clear, well-illuminated, and have a high resolution. Good-quality images help the OCR models accurately detect and recognize characters, while poor-quality images lead to misinterpretations. 

Image quality can be enhanced using image preprocessing techniques like sharpening, noise reduction, contrast adjustment, and image normalization. Another factor to consider with EasyOCR is the language selection. The library supports over 80 languages. It’s crucial to correctly specify the language used in the input images.

An example of image preprocessing that can help with accurate OCR. (Source)

You can also consider batch processing and fine-tuning to improve the efficiency. Batch processing lets you process multiple images at once, saving time and resources when handling large volumes of documents. EasyOCR's fine-tuning options can be used to customize how the engine works for your specific needs. Adjusting settings can also optimize the model for the type of content you're working with, leading to better accuracy and performance.

Applications of EasyOCR

EasyOCR can come in handy for a wide variety of tasks, including image analysis and automated data entry in many languages. 

Consider the retail sector - EasyOCR can be used to automate tasks like generating price tags and tracking inventory by reading product barcodes and labels in retail stores. Take education, EasyOCR can digitize textbooks and research papers for better accessibility and analysis.

Along the same lines, the manufacturing sector can take advantage of EasyOCR for a lot of different tasks. It can automate functions like reading product details for shipment, digitizing documents such as work orders and blueprints, and tracking inventory levels and expiration dates. It can even support quality control by inspecting product labels, barcodes, and serial numbers - similar to what we’ll cover in the coding example in the next section.

Conclusion

EasyOCR is a simple yet reliable tool that makes detecting and extracting text from images easy for developers and non-developers alike. With support for many languages, pre-trained models, and GPU-enabled fast processing, it’s useful across many industries, such as healthcare, finance, retail, and manufacturing. Its flexibility, ease of use, and ability to handle different text formats and languages make it a great choice for tasks that need quick and accurate OCR.

Keep Exploring

Cite this Post

Use the following entry to cite this post in your research:

Abirami Vina. (Oct 16, 2024). How to Use EasyOCR. Roboflow Blog: https://blog.roboflow.com/how-to-use-easyocr/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Written by

Abirami Vina
I write because it's the next best thing to Dumbledore's Pensieve.