Machine learning – the software discipline of mapping inputs to outputs without explicitly programmed relationships – requires substantial computational resources. Traditionally, this limits where machine learning models can run to very powerful supercomputers. But this is changing.
Computation is required at two core moments in the machine learning development lifecycle: model training and model inference. Model training is a factor greater resource hog than model inference. Training a model necessitates uncovering complex relationships between inputs and outputs through intensive trial and error. Model inference, on the other hand, only requires making use of previously discovered relationships. Thus, model inference can occur on significantly less computational resources.
At the same time, increasingly powerful models are also shrinking in size. (For example, how big is YOLOv4-tiny? Weights for YOLOv4-tiny are only 23.1MB.) Moreover, hardware is becoming cheaper and faster. (For example, the NVIDIA Jetson now comes in a 'Nano' unit, which includes a 128-core NVIDIA Maxwell GPU and costs less than $100.)
It is the confluence of better and smaller models plus cheaper yet more powerful hardware that gives rise to embedded machine learning.
Embedded machine learning is deploying machine learning algorithms to run on microcontrollers (really small computers). This includes running a neural network on a Raspberry Pi, NVIDIA Jetson, Intel Movidius, or Luxonis OAK. Embedded machine learning is a type of edge computing: running algorithms on end-user computational resources rather than a central data center (the cloud).
When to Use Embedded Machine Learning
Why use embedded ML? Embedded machine learning can offer a few key advantages compared to cloud-based processing:
Speed: Without a round-trip to a server for predictions, model inputs and outputs can be provided much more quickly.
Connectivity: An internet connection is not required for embedded machine learning.
Privacy: All data processing happens on a device directly where a user is present, meaning the input data received stays locally.
Embedded machine learning also introduces constraints – namely, models must be smaller, often resulting in lower accuracy. Moreover, incorporating active learning – which accelerates model improvement – can be more challenging as receiving inputs for model retraining may be delayed or even unavailable altogether.
How to Use Embedded Machine Learning in Computer Vision
If you've determined embedded machine learning is the best option for implementing your use case, the next key steps include (1) collecting a dataset (2) developing a model (3) selecting hardware appropriate for the task at-hand (4) deploying to that given hardware (5) implementing a system for continued model improvement.
We'll focus the remainder of this post on building and deploying computer vision models, specifically.
Deploying Computer Vision Models to the Edge
In order to deploy a computer vision model to an edge device, that edge device must be set up with the requisite dependencies a given model expects. For example, if you're using TensorFlow to run on a NVIDIA Jetson, that NVIDIA Jetson must be configured with the correct CUDA drivers to support the version of TensorFlow you're running. (The same is true for any other framework: PyTorch, Caffe, Darknet, etc.) Managing dependencies on a given edge device is often a great place to use Docker. (Note: we're written about how to use GPUs with Docker previously.)
Once environments are setup with the correct drivers, you may need to build your model framework (in the specific version you require) on the edge device. Building a framework like Darknet or TensorFlow can take 14 hours (or longer) in our experience.
Upon having environments setup with the correct dependencies and a model framework built, a model can be deployed to your edge device of interest.
At Roboflow, we've also released Docker containers for running computer vision models on your NVIDIA Jetson.
This takes a lot of the guesswork out of getting configurations correct so that your models run consistently and with high performance.
As always, good luck building!