"ML in a Minute" is our conversational series on answering machine learning questions. Have questions you want answered? Tweet at us.

What is TensorRT (in 60 Seconds or Fewer)?

TensorRT is a machine learning framework that is published by Nvidia to run inference that is machine learning inference on their hardware. TensorRT is highly optimized to run on NVIDIA GPUs. It's likely the fastest way to run a model at the moment.

If you're using the NVIDIA TAO Toolkit, we have a guide on how to build and deploy a custom model.

Be sure to subscribe to our channel: https://bit.ly/rf-yt-sub

If you Want to Convert your Model to TensorRT, How Do You Do that?

In order to get to TensorRT you're usually starting by training in a framework like PyTorch or TensorFlow, and then you need to be able to move from that framework into the TensorRT framework.

The nice thing is that Roboflow, makes it easy to do all these things: https://docs.roboflow.com/inference/nvidia-jetson

Delpoy to a Jetson with Roboflow for free

Use Roboflow to manage datasets, train models in one-click, and deploy to web, mobile, or the edge. With a few images, you can train a working computer vision model in an afternoon.

Cuda Cores vs Tensor Cores

TensorRT runs on the cuda cores of your GPU. Cuda is the direct api that your machine learning deployment will use to communicate with your GPU. Tensor cores on the other hand are utilized by Google TPUs. Unless you are working at Google, we do not recommend using TPU based deployment as it has not grown in the open source ecosystem like cuda and TensorRT have.

How to Install TensorRT

Before you embark on installing TensorRT, we highly recommend that you work from a linux base, preferably Ubuntu 20.04. If you don't have an Ubuntu server with a GPU, you can spin one up on AWS p2.xlarge

Step 1: Install NVIDIA GPU drivers

sudo apt install nvidia-driver-440
sudo reboot
nvida-smi (to check if working)

Step2: Install Cuda

Download correct cuda distribution from NVIDIA, then install

sudo dpkg -i cuda-repo-ubuntu1804–10–0-local-10.0.130–410.48_1.0–1_amd64.deb
sudo apt-key add /cuda-repo-10–0-local-10.0.130.410.48/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

Follow the steps in this Cuda installation guide to put cuda locations into your environment.

Step3: Install TensorRT

Download correct TensorRT distribution for you system from NVIDIA.

Install with the following commands, substituting your file.

sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt6.0.1.5-ga-20190913_1-1_amd64
sudo apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt6.0.1.5-ga-20190913_1-1_amd64/7fa2af80.pub
sudo apt-get update
sudo apt-get install tensorrt
# this is for python2 installation
sudo apt-get install python-libnvinfer-dev
#this is for python3 installation
sudo apt-get install python3-libnvinfer-dev
sudo apt-get install uff-converter-tf
sudo apt-get install onnx-graphsurgeon
dpkg -l | grep TensorRT

TensorRT Tutorial

Once you have TensorRT installed you can use it with NVIDIA's C++ and Python APIs.

To get started, we recommend that you check out the open source tensorrt repository by wang-xinyu. There you will find implementations of popular deep learning models in TensorRT.

TensorRT for CPU

TensorRT is only usable for GPU inference acceleration. If you want to optimize inference on your CPU you should be exploring the OpenVINO and ONNX frameworks.

TensorRT for Jetson

You can run TensorRT on your Jetson in order to accelerate inference speeds. Newer distributions of Jetson Jetpack may already have TensorRT installed. You may also want to start from a base Docker image that already has installs made for you, such as nvcr.io/nvidia/l4t-ml:r32.5.0-py3

Conclusion

TensorRT is an inference acceleration library published by NVIDIA that allows you to fully leverage your NVIDIA GPU resources at the cutting edge.

Liked this? Be sure to also check out the computer vision glossary.