"ML in a Minute" is our conversational series on answering machine learning questions. Have questions you want answered? Tweet at us.
What is TensorRT (in 60 Seconds or Fewer)?
TensorRT is a machine learning framework that is published by Nvidia to run inference that is machine learning inference on their hardware. TensorRT is highly optimized to run on NVIDIA GPUs. It's likely the fastest way to run a model at the moment.
If you Want to Convert your Model to TensorRT, How Do You Do that?
In order to get to tensor RT you're usually starting by training in a framework like PyTorch or TensorFlow, and then you need to be able to move from that framework into the TensorRT framework. The nice thing is that Roboflow, makes it easy to do all these things: https://docs.roboflow.com/inference/nvidia-jetson
Cuda Cores vs Tensor Cores
TensorRT runs on the cuda cores of your GPU. Cuda is the direct api that your machine learning deployment will use to communicate with your GPU. Tensor cores on the other hand are utilized by Google TPUs. Unless you are working at Google, we do not recommend using TPU based deployment as it has not grown in the open source ecosystem like cuda and TensorRT have.
How to Install TensorRT
Before you embark on installing TensorRT, we highly recommend that you work from a linux base, preferably Ubuntu 20.04. If you don't have an Ubuntu server with a GPU, you can spin one up on AWS
Step 1: Install NVIDIA GPU drivers
sudo apt install nvidia-driver-440 sudo reboot nvida-smi (to check if working)
Step2: Install Cuda
Download correct cuda distribution from NVIDIA, then install
sudo dpkg -i cuda-repo-ubuntu1804–10–0-local-10.0.130–410.48_1.0–1_amd64.deb sudo apt-key add /cuda-repo-10–0-local-10.0.130.410.48/7fa2af80.pub sudo apt-get update sudo apt-get install cuda
Follow the steps in this Cuda installation guide to put cuda locations into your environment.
Step3: Install TensorRT
Download correct TensorRT distribution for you system from NVIDIA.
Install with the following commands, substituting your file.
sudo dpkg -i nv-tensorrt-repo-ubuntu1804-cuda10.0-trt184.108.40.206-ga-20190913_1-1_amd64 sudo apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda10.0-trt220.127.116.11-ga-20190913_1-1_amd64/7fa2af80.pub sudo apt-get update sudo apt-get install tensorrt # this is for python2 installation sudo apt-get install python-libnvinfer-dev #this is for python3 installation sudo apt-get install python3-libnvinfer-dev sudo apt-get install uff-converter-tf sudo apt-get install onnx-graphsurgeon dpkg -l | grep TensorRT
Once you have TensorRT installed you can use it with NVIDIA's C++ and Python APIs.
To get started, we recommend that you check out the open source tensorrt repository by
wang-xinyu. There you will find implementations of popular deep learning models in TensorRT.
TensorRT for CPU
TensorRT for Jetson
You can run TensorRT on your Jetson in order to accelerate inference speeds. Newer distributions of Jetson Jetpack may already have TensorRT installed. You may also want to start from a base Docker image that already has installs made for you, such as
TensorRT is an inference acceleration library published by NVIDIA that allows you to fully leverage your NVIDIA GPU resources at the cutting edge.
Liked this? Be sure to also check out the computer vision glossary.