Running Tensorflow JS on an NVIDIA Jetson

The NVIDIA Jetson line is a series of AI-capable low-power computers. They range from the $59 Jetson Nano (2GB) to the $899 Jetson AGX Xavier and are a popular choice for powering machine learning projects on the edge.

Tensorflow.js is a library for deploying machine learning in JavaScript; it can be used to deploy to a web browser or Node.js on a wide variety of platforms.

In this post we will show how to run Tensorflow JS (TFjs) on a Jetson device with GPU acceleration.

Looking to deploy to a Jetson without the headache?

Roboflow enables teams to deploy to a Jetson in an afternoon, no special configuration required. We provide the tools you need for image collection, image organization, annotation, training, and deployment.

In a Web Browser

The easiest way to run Tensorflow.js on a Jetson is in a web browser like Chromium. Using the WebGL backend, it will use the GPU to accelerate inference out of the box with no changes to your codebase necessary.

Try our Roboflow.js demo with your webcam to test it out. We got speeds of about 4 frames per second on the Xavier NX and 2 frames per second on the Jetson Nano.

In Node.js

But what if you want to run headless? Can we use tfjs-node on a Jetson device? Yes, but Google doesn't provide binaries of libtensorflow or the TFjs C++ bindings for the arm64 (aarch64) platform so you'll need to compile them yourself which is quite a chore.

If you try to install @tensorflow/tfjs-node or @tensorflow/tfjs-node-gpu via npm on your Jetson you will get an error:

CPU-linux-3.2.0.tar.gz
* Downloading libtensorflow
(node:10327) UnhandledPromiseRejectionWarning: Error: Unsupported system: cpu-linux-arm64
    at getPlatformLibtensorflowUri (/home/xavier/tfjs/node_modules/@tensorflow/tfjs-node/scripts/install.js:100:11)
    at downloadLibtensorflow (/home/xavier/tfjs/node_modules/@tensorflow/tfjs-node/scripts/install.js:134:7)
    at async run (/home/xavier/tfjs/node_modules/@tensorflow/tfjs-node/scripts/install.js:199:5)

This is because the Jetson runs an arm64 processor which does not have a prebuilt libtensorflow binary available.

Compiling TFjs for arm64 manually

If you're a glutton for pain, you might want to try getting Tensorflow JS to work with your Jetson on your own. It took me a solid week of trial and error to get things working and I'll spare you some of the headaches by listing the "gotchas" I found here. (If you just want to get things working quickly, skip to the next section.)

Steps on the Xavier NX:

  • Create an 8GB swapfile so your build doesn't run out of memory 5 hours into the tensorflow build.
  • Tensorflow JS for Node.js requires CUDA 10.0 and CuDNN 7 for hardware acceleration. (This corresponds to NVIDIA Jetpack 4.3 for the Jetson Nano. Unfortunately this version is not available for the Xavier NX so you're on your own installing the proper CUDA and CuDNN versions.)

    If you don't install the correct versions you will get an error like this and TFjs will run slowly without GPU support:
2021-02-28 17:19:50.956215: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:50.956413: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-02-28 17:19:51.143232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-02-28 17:19:51.185940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 17:19:51.186156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2021-02-28 17:19:51.186431: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.186693: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.186877: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.187045: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.187240: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.187466: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.187639: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2021-02-28 17:19:51.187675: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

The build will take about 7 hours on a Xavier NX; then you will need to copy the libtensorflow.tar.gz file to a server, create a custom-binary.json file pointing at the hosted file in the scripts directory, and run npm install .

This will initially fail to link to the existing C++ bindings included in the package. To fix, run npm rebuild @tensorflow/tfjs-node --build-from-source to rebuild the native addon for arm64.

Easy Mode: Use our Docker container and arm64 build

Luckily, we have already gone through all the pain of getting Tensorflow.js running on an NVIDIA Jetson so you don't have to. Simply pull our roboflow/tfjs-jetson Docker container and install our @roboflow/tfjs-jetson npm package and you'll be in business!

# pull the docker container with CUDA 10.0, CuDNN 7, and Node 14
sudo docker pull roboflow/tfjs-jetson

# run it (and expose the GPU to the container)
sudo docker run -it --device /dev/nvhost-ctrl --device /dev/nvhost-ctrl-gpu --device /dev/nvhost-prof-gpu --device /dev/nvmap --device /dev/nvhost-gpu --device /dev/nvhost-as-gpu roboflow/tfjs-jetson

In real life, you will just want to use the roboflow/tfjs-jetson Docker container as a base for your own image containing your custom application (we will have an example repo up soon that you can clone as a starting point); but for demonstration purposes, the above docker run command starts in interactive terminal mode. Install our npm package like so:

mkdir app
cd app
npm init
npm install @roboflow/tfjs-jetson

Now, you can create an index.js file that includes the package:

const tf = require("@roboflow/tfjs-jetson");

When you run node index.js you will see the output from Tensorflow confirming that the libraries loaded and your GPU is being used! (Unfortunately, these messages are outputted by the C++ libraries so I haven't found a way to suppress them.)

2021-02-28 23:36:49.033332: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-02-28 23:36:49.141779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-02-28 23:36:49.147587: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:49.147760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.109
pciBusID: 0000:00:00.0
2021-02-28 23:36:49.147825: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-02-28 23:36:49.151261: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-02-28 23:36:49.154064: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-02-28 23:36:49.154936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-02-28 23:36:49.158758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-02-28 23:36:49.161785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-02-28 23:36:49.170653: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-02-28 23:36:49.170861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:49.171068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:49.171173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-02-28 23:36:50.250722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-28 23:36:50.250811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2021-02-28 23:36:50.250841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2021-02-28 23:36:50.251105: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:50.251327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:50.251509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] ARM64 does not support NUMA - returning NUMA node zero
2021-02-28 23:36:50.251665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 455 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)

Congratulations! You now have Tensorflow JS running with CUDA on your Jetson's GPU.

Note: if you get the following error, it means you haven't exposed your Jetson's GPU to the container; be sure to use the --device options above in your docker run command:

2021-02-28 23:27:09.066835: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-02-28 23:27:09.067045: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (31b1216b1342): /proc/driver/nvidia/version does not exist

Stay Tuned

We'll be updating roboflow.js to work in Node.js soon so that you can use your Roboflow Train models on the edge.

In early tests we've seen about a 2.5x speedup in our TFjs models' framerate when inferring via CUDA in Node.js as compared to WebGL in the browser (about 10 fps on the Xavier NX). Check back soon for updates!