Robotic Pick and Pack Sorting with Computer Vision

Pick and pack sorting is the process of selecting items from inventory (picking) and organizing them for shipment or distribution (packing). It is a key workflow in logistics, warehousing, and fulfillment centers, where customer orders are fulfilled by accurately picking the requested items and packing them securely for delivery.

In this guide, we are going to walk through what robotic pick and pack sorting is, how it works, and how you can use computer vision to identify the location of objects for use in picking and packing with robotic arms.

Let's get started!

How Robotic Pick and Pack Sorting Works

The key steps in pick and pack sorting are:

Picking: Retrieving items from storage based on specific orders or requirements.
Sorting: Organizing picked items by order, destination, or category.
Packing: Securing items into boxes or parcels, often with protective materials, labels, and documentation.

In pick and pack sorting, camera and computer vision algorithm is used to detect and classify objects, and robotic system (generally a robotic arm) and conveyor belts is used to execute precise pick-and-pack actions to sort object. The workflow involves following components.

Robotic Picking Systems

At the picking end robotic arms with grippers or suction mechanisms pick items from shelves with the help of computer vision system to identify and locate items.

Automated Sorting Systems

Conveyor belts with computer vision sort items by size, weight, or destination. Advanced algorithms optimize sorting routes.

Packing Automation

At the packing end, robotic arms pack items into designated boxes or bins. Computer vision systems verify the packing area and guide the robotic arm to place the item into a box or container accurately. The following figure illustrates how the system works.

How Pick and Pack Sorting System Works

The above figure shows that items are picked from storage shelves or bins based on specific orders and places on conveyor belts. The items are sorted by order, destination, or category. Items are packed securely in appropriate packaging for shipping (e.g., boxes or parcels).

In pick-and-pack sorting systems, computer vision plays an important role in guiding robotic arms to identify which object to pick and identify the location to place it, and sorting system to classify and sort the object based on predefined criteria. To understand how computer vision assisted sorting works, read blog titled Automated Sorting with Computer Vision. In the following section, we will understand how robotic-arm identify object and its location in order to pick it.

How Robotic Arms Work in Robotic Picking and Sorting

The working of robotic arms to pick an object are specified in the following steps. The robotic-arm are equipped with camera and computer vision system.

How robotic-arm pick an place an object (Source)

Step #1: Object Detection and Recognition

Cameras capture images of the workspace where objects are located. Computer vision algorithms, such as YOLO, detect and classify objects in the images. The system identifies key attributes like object type, size, shape, color, or labels.

Step #2: Object Localization

Once objects are detected, the system calculates their positions in the camera's field of view, typically as 2D image coordinates (x, y). For 3D localization, depth cameras or stereo vision determine the object's distance (z-coordinate) from the camera.

Step #3: Coordinate Transformation

The detected coordinates are transformed from the camera’s frame of reference to the robotic arm’s coordinate system using calibration techniques like hand-eye calibration. There are two primary hand-eye configurations:

Eye-in-Hand
Eye-to-Hand

Eye-in-Hand: In this configuration, the camera is mounted on the robot's end-effector, moving with the robot. This setup provides dynamic viewpoints and is beneficial for tasks requiring close-up inspections or operations in varying positions.

Eye-in-hand calibration with camera on robotic arm (Source)

Eye-to-Hand: In this configuration, the camera is fixed in the workspace, observing the robot's movements from a stationary position. This configuration is useful for monitoring larger work areas and tasks where the robot's end-effector needs to be unobstructed.

Eye-to-hand calibration with camera mounted on top (Source)

The result of the hand-eye calibration process is the object's precise location in 3D space relative to the robot which is essential to grasp the detected object.

Step #4: Grasp Planning

Based on the object's shape and orientation, the system identifies the best point for the robotic arm’s gripper to pick it up. If required, the robotic arm adjusts its approach angle to align with the object’s orientation.

Step #5: Robotic Arm Motion

The robotic system calculates the optimal trajectory to move the arm from its current position to the object. The arm executes the grasp using its end effector (e.g., gripper or suction cup).

Step #6: Sorting and Placement

Based on predefined criteria (e.g., type, size, or destination), the system determines where the object should be placed. The robotic arm moves the object to the designated bin, conveyor belt, or packaging area.

Step #7: Feedback Loop

Sensors and cameras monitor the process to ensure successful picking and placement. If a failure occurs, the system recalibrates or retries the action.

The following figure shows the working of the robotic-arm through a tomato picking application.

The working process of a tomato picking robot (Source)

Computer Vision Applications for Robotic Pick and Pack Sorting Systems

There are many computer vision applications used in pick and pack sorting systems. Some popular applications are discussed here.

Automated Barcode and QR Code Scanning

Robotic arms equipped with cameras use OCR (Optical Character Recognition) or code-scanning algorithms to automatically read barcodes or QR codes on items. This technology enables the system to identify the SKU (Stock Keeping Unit) of each item and cross-check it with the order database to ensure that the correct items are picked and packed. For example, in an e-commerce warehouse, a robotic arm scans a QR code on a package, identifies it as a smartphone case of a specific model, and verifies it against the order details. If the item matches the order, it proceeds to packing; if not, it is flagged for review. Following are the steps how it works in Pick-and-Pack sorting:

The camera mounted on the robotic arm scans the barcode or QR code on the item to extract information like SKU or product details.
The system matches the extracted SKU with the order database to confirm that the item corresponds to the order requirements.
Based on the verification, the system decides whether to accept or flag the item. Correct items are cleared for sorting, while incorrect items are sent to a review area.
The robotic arm picks up the verified item and places it in the correct bin or packing area. If an error is detected, the arm places the item in a designated "error bin" for manual inspection.

You can use this barcode detection API build with Roboflow to detect barcode. Follow the Roboflow blog How to Use a Free Computer Vision Barcode Detection API to learn how to build barcode detection and reading application.

Barcode detection (Source)

Dimension Validation

Computer vision systems play an important role in validating those items that match their expected size during the packaging and sorting process. This ensures that appropriate packing materials are used and helps detect errors such as incorrect or extra items being included. For example, in e-commerce fulfillment centers, computer vision verifies that packages meet courier size requirements before shipping to avoid surcharges or rejections. By automating these checks, the system enhances efficiency and reduces human errors, ensuring a smooth shipping process. The following steps explain how the dimension Validation system works in Pick-and-Place sorting for sorting and placing boxes with varying dimensions which is a common use case in warehouses and logistics centers:

A camera mounted above the conveyor belt captures images of each box as it moves along. Computer vision software analyzes the dimensions (length, width etc) of the box from the images.
The dimensions and weight captured by computer vision models are compared against predefined values. The system ensures the box matches its expected size.
Based on the barcode or QR code scanning and validation, the system determines the appropriate destination for each item. Correctly validated items are assigned to specific bins, packing stations, or conveyor paths according to their SKU or order requirements.
A robotic arm, equipped with a gripper or suction system, picks up the boxes based on the sorting decision and places them into the correct bins or onto designated conveyor belts. The robotic arm operates based on instructions from the computer vision system, ensuring accurate picking or placement.

The steps to build dimension measurement application are explained in the Roboflow blog Dimension Measurement with Computer Vision.

Dimension Inspection (Source)

Label Verification

In label verification pick-and-place sorting applications, computer vision systems work alongside robotic arms to identify, verify, and sort packages with labels, such as shipping addresses, gift notes, or dietary information. The following are the steps in which the system works:

A camera captures images of packages as they move along a conveyor belt. The computer vision system analyzes these images to detect and verify the labels applied to the packages.
The system checks the labels for accuracy. For example, it ensures that a shipping address matches the intended destination or that a dietary label like "Gluten-Free" is present and correctly applied.
Based on the verified label information, the system determines the appropriate sorting bin or destination for each package. For example, packages with dietary labels might be routed to specific bins for allergen-free orders.
The robotic arm, equipped with grippers or suction mechanisms, is directed by the computer vision system to pick up each package. It moves with precision to place the package in the correct sorting bin or onto the appropriate conveyor belt.

Read the Roboflow blog What is OCR Data Extraction? to learn how to build the computer vision application to read information from product labels.

Reading data from product label (Source)

Building a Computer Vision Model for a Pick and Pack Sorting System

In pick and pack sorting systems, a robotic arm guided by camera and computer vision model is used. Designing an application that enables a robotic arm to recognize objects and identify their locations involves integrating computer vision with robotic control systems. The following video shows how a robotic arm pick an object and sort it by placing it to designated area. The robotic arm uses a computer vision model to detect and identify the object.

A Robotic Arm equipped with camera and computer vision model (Source)

We will now build a computer vision model and write an inference script to detect and classify the object and find its center point coordinate to assist the robotic arm for the pick and pack sorting system. This center point will then be used by a depth camera which provides depth information for the pixel in the image, assisting robotic arm to estimate distances to objects in order to pick it.

A stereo depth camera (Intel RealSense D435) with image captured by one of the infrared cameras, and the computed stereo depth (Source)

This depth information is important for a robotic arm because it allows the arm to accurately grasp objects. By knowing the distance to an object, the arm can precisely position its gripper to pick it up without colliding with other objects or missing the target entirely.

The center point of a detected object is used because it represents the object's approximate location in the image. Depth cameras typically provide depth information for each pixel in the image. By focusing on the center point of the object's bounding box, the depth value corresponding to that specific point is obtained. This depth value, combined with the camera's intrinsic parameters (such as focal length and sensor size), can be used to calculate the distance to the object's center. The following are the steps to build the project.

Step #1: Collect and label dataset

Create an object detection project in Roboflow and collect and upload the images. I have uploaded the following images of hardware development kits.

Dataset

After uploading the images, label images using Roboflow annotate.

Dataset Annotation

Step #2: Train the computer vision model

After labeling all images create the dataset and train computer vision model using Roboflow auto-train. The Roboflow auto-train allows to train computer vision models without needing to write training code.

Training Metrics

The above figure shows the training metrics after training the model.

Step #3: Build the inference script

The following is the inference script that runs the trained model on a camera stream and draws the bounding box and provides the center point coordinates in the visualization.

import cv2
from roboflow import Roboflow

# Initialize Roboflow
rf = Roboflow(api_key="ROBOFLOW_API_KEY")
project = rf.workspace().project("pick-n-pack")
model = project.version("1").model

# Open the camera (default camera is index 0)
camera = cv2.VideoCapture(0)

if not camera.isOpened():
    print("Error: Could not open camera.")
    exit()

print("Press 'q' to quit the application.")

while True:
    # Capture frame-by-frame
    ret, frame = camera.read()

    if not ret:
        print("Error: Failed to capture frame.")
        break

    # Save the frame temporarily for inference
    temp_frame_path = "temp_frame.jpg"
    cv2.imwrite(temp_frame_path, frame)

    # Perform inference on the frame
    predictions = model.predict(temp_frame_path, confidence=40, overlap=30).json()

    # Process and visualize predictions
    for pred in predictions["predictions"]:
        # Extract prediction details
        x, y = int(pred["x"]), int(pred["y"])  # Center of the bounding box
        width, height = int(pred["width"]), int(pred["height"])
        class_name = pred["class"]
        confidence = pred["confidence"]

        # Calculate the top-left corner of the bounding box
        x1, y1 = x - width // 2, y - height // 2
        x2, y2 = x + width // 2, y + height // 2

        # Draw the bounding box
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)

        # Draw the center point
        cv2.circle(frame, (x, y), 5, (0, 0, 255), -1)

        # Display the class name, confidence, and center point
        label = f"{class_name} ({confidence:.2f})"
        center_text = f"Center: (x = {x}, y = {y})"
        cv2.putText(frame, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        cv2.putText(frame, center_text, (x1, y1 - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

    # Display the frame with annotations
    cv2.imshow("Camera Feed", frame)

    # Break the loop on 'q' key press
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the camera and close all OpenCV windows
camera.release()
cv2.destroyAllWindows()

Following is the result you see when you run the code.

Output

The output image shows the center point of the object being detected which is used to extract the depth value for the object and use it for picking or interacting with the object by the robotic arm equipped by depth camera.

Using Robotic Picking and Sorting Systems

In this blog, we learned how pick and pack sorting systems integrate computer vision for picking, sorting and packing the product in the logistics and warehouse operations. We have explored the key concepts and applications and how Roboflow helps to build applications for pick and pack sorting systems. With the help of computer vision these systems enable precise and efficient handling of inventory or other operations. From barcode scanning to dimension validation and label verification, computer vision ensures accuracy and reduces human error.