According to Gartner, 85% of machine learning projects fail. Worse yet, Gartner predicts that this trend will continue through 2022. So, when you do get a model into production, it's important to have great accuracy and performance when it comes to inference.

One way to address this issue is to use accelerated inference, a technique that can speed up the inference process by using specialized hardware and/or libraries.

Intel, in collaboration with Microsoft, has redefined inference on Intel® hardware by integrating OpenVINO™ with Torch-ORT while maintaining the native PyTorch experience.

When it comes to PyTorch models specifically, this gives you the ability to use the PyTorch APIs and achieve accelerated inference performance gains on Intel® hardware.

This post will show you how to get a ~13% improvement in performance by only adding 2 lines of code.

OpenVINO™ Integration with Torch-ORT

OpenVINO™ integration with Torch-ORT gives PyTorch developers the ability to stay within their chosen framework all the while still getting the speed and inferencing power of OpenVINO™ toolkit through inline optimizations used to accelerate your PyTorch applications.

Benefits of OpenVINO™ integration with Torch-ORT:

  • Easy Installation — Install OpenVINO™ integration with Torch-ORT with pip
  • Simple API — No need to refactor existing code, just import OpenVINO™ integration with Torch-ORT, set your desired target device to run inference and wrap your model
  • Performance — Achieve higher inference performance over native PyTorch
  • Support for Intel devices — Intel® CPUs, Intel® integrated GPUs, Intel® VPUs
  • Inline Model Conversion — no explicit model conversion steps required

In this case study, we will talk about how we re-trained a PyTorch YOLOv7 model on a custom open source dataset from Roboflow Universe and evaluated performance using OpenVINO™ Integration with Torch-ORT as compared to native PyTorch on Google Colab.

With that, let's jump into the notebook. We will not walk-through the full YOLOv7 notebook in this tutorial, but rather focus on inference results of native PyTorch vs OpenVINO™ Integration with Torch-ORT.

The goal of this tutorial is to compare inference performance on Intel® CPU, therefore we split this into two notebooks:

  • In the first notebook, we utilize GPU compute for training YOLOv7. We train the model and save that model to your Google Drive, to be used by the second notebook for inference.
  • In the second notebook, we run inferencing on Intel® CPU compute to benchmark native PyTorch performance vs OpenVINO™ Integration with Torch-ORT.

All corresponding code for this case study can be found in this GitHub repo

Comparing GPU vs CPU Costs for Inference

Generally speaking, GPU performance is usually 3x greater than that of CPU. However, this comes at a significant cost increase.

For example, the AWS 3rd Gen Intel® Xeon® Scalable (Ice Lake) instance - c6i.2xlarge - has an on-demand cost of $0.34 an hour whereas the NVIDIA GPU based p3.2xlarge instance has a cost of $3.06 an hour. If you can get accelerated inference performance boost with the cost savings of CPU, when running your inference workloads, you are in a much better position financially.

Training YOLOv7 to Find Inference Performance

To retrain your YOLOv7 model, refer to the blog - How to Train YOLOv7 model on a Custom Dataset.

Evaluation of YOLOv7 Inference Performance Notebook

Let’s jump right into the evaluating inference performance notebook. If you haven’t re-trained the YOLOv7 model with your own custom dataset, we have already provided a model (best.pt) in the GitHub repository for this benchmarking.

Once you have accessed the inference notebook, we recommend you save a copy locally. You do this by selecting File > Save a Copy in Drive to fork our notebook to your own Google Drive so you can save your changes.

You’re also going to want to confirm your runtime is CPU for this notebook. Go to Runtime > Change Runtime Type > Hardware accelerator > None

Access the Computer Vision Model within the Colab Notebook

In the training notebook, we trained a model on a custom dataset, then saved that model to Google Drive. In the inference notebook, we mount the Google Drive within the Colab notebook, so we can easily access the model from within the notebook.

Note: If you do not have a retrained model, you can use the pre-trained model from the Roboflow YOLOv7 repo. It resides in /content/yolov7/runs/best.pt.

Evaluate YOLOv7 performance with Native PyTorch

We first evaluated the performance of our YOLOv7 model with native PyTorch on CPU. We have adjusted the below custom arguments to accomplish this. For more details, see the arguments accepted by detect.py.

There are 2 minor changes we've made to run "detect.py" on CPU:

  1. We have commented out lines 38 and 39 in detect.py as the code is using jit trace.
  2. Added lines 84 and 85 to enable device type as "cpu"

These changes were made and copied to a new file detect_without_jit.py

Now, we cd into the yolov7 directory and run evaluation on a sample image using the new detect_without_jit.py file:

If you do not have a trained model from the previous training notebook, run this command:

!python detect_without_jit.py --weights /content/yolov7/runs/best.pt --conf 0.25 --img-size 640 --source UWH-6/test/images/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

If you do have a trained model saved in your Google Drive you would like to use, run this command:

!python detect_without_jit.py --weights /content/gdrive/MyDrive/TrainedModel/best.pt --conf 0.25 --img-size 640 --source UWH-6/test/images/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

Here we can see the output with the results using native PyTorch:

4 black-hats, 9 bodysurfaces, Done. (901.6ms) Inference, (19.1ms) NMS


INFERENCE TIME WITH NATIVE PYTORCH IS 901.6 ms


 The image with the result is saved in: runs/detect/exp/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

Done. (2.070s)

We get an inference time of 901.6 ms. (This may vary based on your output).

Here is the output result image with detections:

Evaluate YOLOv7 performance with OpenVINO™ Integration with Torch-ORT

Now we will do the same evaluation but using OpenVINO™ Integration with Torch-ORT. First we need to pip install the torch-ort-infer package:

# Install torch-ort-infer
!pip install torch-ort-infer

We have adjusted the below custom arguments to accomplish this. For more details, see the arguments accepted by detect.py.

Here, we have added just 2 lines of code to boost performance with OpenVINO™ Integration with Torch-ORT:

  1. line 17: from torch_ort import ORTInferenceModule
  2. line 71: model = ORTInferenceModule(model)

These changes were made and copied to a new file detect_ort.py.

Now we run evaluation on a sample image using the new detect_ort.py file:

If you do not have a trained model from the previous training notebook, run this command:

!python detect_ort.py --weights /content/yolov7/runs/best.pt --conf 0.25 --img-size 640 --source UWH-6/test/images/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

If you do have a trained model saved in your Google Drive you would like to use, run this command:

!python detect_ort.py --weights /content/gdrive/MyDrive/TrainedModel/best.pt --conf 0.25 --img-size 640 --source UWH-6/test/images/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

Here we can see the output with the results using OpenVINO™ Integration with Torch-ORT:

4 black-hats, 9 bodysurfaces, Done. (778.2ms) Inference, (1.6ms) NMS


INFERENCE TIME WITH OPENVINO™ INTEGRATION WITH TORCH-ORT IS 778.2 ms


 The image with the result is saved in: runs/detect/exp2/DJI_0021_mp4-32_jpg.rf.0d9b746d8896d042b55a14c8303b4f36.jpg

Done. (8.239s)

We get an inference time of 778.2 ms.

Here is the output result image with detections:

Summary

Google Colab currently has Intel® Xeon CPU with 2 cores per socket and is a shared hardware resource, yet we see ~13% improvement in performance with OpenVINO™ Integration with Torch-ORT for the YOLOV7 model.

Of course, it is worth noting that we accomplished this with only 2 additional lines of code. As you can see in the results of the test image above, there is no drop in accuracy.

In part 2 of this case study, we will take this a step further and move to AWS. We will showcase even better performance by benchmarking on a 3rd Gen Intel® Xeon® Scalable (Ice Lake) instance - c6i.2xlarge and deploy the model on an edge device to run inference with OpenVINO™ Integration with Torch-ORT.

Stay tuned!