When we talk about Deep Learning (DL), we often focus on new SOTA models and how they pass yet another data science milestone. However, there is rarely a conversation about how much engineering work it takes to deploy those models.
Dozens of Python packages, system-level dependencies, CPU architecture, CUDA versions, and quite often sophisticated tools like DeepStream or TensorRT… the list goes on and on. All of that can be overwhelming and hard to manage.
It may seem counterintuitive that adding another technology to your stack will simplify things. Perhaps this is why many people still don’t use Docker for model deployment.
This blog post is not about how to deploy computer vision models with Docker— we already have an excellent tutorial covering that subject–but about why you should use Docker in your Deep Learning ecosystem.
What is Docker?
Docker is a tool for developing and deploying applications using containers. It allows you to package your software, along with all its dependencies, down to the operating system level. The docker ecosystem is built on three pillars:
- Dockerfile - code that tells Docker how to build the Image
- Image - read-only template with instructions for creating a Docker container
- Container - actual software deployed on the server
You can configure Docker to run GPU-accelerated containers on your server and this is a great way to deploy your models, by allowing them to run in isolated environments. Applications can use different versions of Deep Learning libraries and sometimes even CUDA and not affect each other.
See the diagram below to learn more about the host configuration requirements.
Solve the "Works on My Machine" Problem
Whenever someone wants to convince you to use Docker, they start with the same argument. It solves the “works on my machine” problem.
While this is probably the case for most projects, it is certainly not true for Deep Learning. The containerization around CUDA still leaves much to be desired, and a lot depends on getting the host configuration right. However, it is much better than installing the environment directly on the box, especially if you are simultaneously working on multiple projects.
The problems that Docker solves well are compatibility and reproducibility.
AWS Batch, EC2/ECS, Lambda, and SageMaker are all compatible with Docker. The same is true of their counterparts on Azure and Google Cloud Platform (GCP). Docker is probably the only fully generic option that allows you to deploy your model on any of these platforms.
And the same image will work in the cloud and locally, on your PC (assuming your system meets the right hardware requirements and is set up properly), and on edge devices like the NVIDIA Jetson and Raspberry Pi.
Docker is also a great way to learn from others. There’s a good chance that some engineer in the world has already solved a very similar — and not uncommonly the same — problem that you are facing. On top of that, many open source DL projects are starting to adopt Docker and are posting sample Dockerfiles in their repositories.
Hit the Ground Running on Any Hardware
DeepStream or TensorRT are just a few libraries that can give you a performance boost but require tight integration with the underlying hardware. Unfortunately, those technologies often have significant configuration overhead.
Of course, the process becomes even more complicated when you use more exotic devices. In our experience, custom ARM deployments were undoubtedly the most painful. Assembling a fully working Python environment takes a lot of time, as you must build each package from the source and make sure drivers, packages, and system modules are compatible. If you decide to benchmark on a different machine type or deploy to a different edge device, you’ll likely need to start all over.
One of the most significant advantages of using Docker is the possibility of leveraging existing base images.
Docker allows you to skip the plumbing step with a single command and focus on the actual app and model deployment. Take a walk around NVIDIA NGC. You no longer have to wonder what version of CUDA or cuDNN you need. Maybe you will find a base container that will save you a lot of headaches.
Creating CI/CD Flows with Docker
In Deep Learning projects, it is difficult to test GPU-dependent code. Often, it boils down to testing everything on the CPU with the hope that it will be enough or, in extreme cases, mocking everything GPU-related.
But if you’re not going to test the actual code you’re going to run in production, you might as well not even test at all. There are many gotchas around floating point precision, quantization, tensor device management, supported operations, batching, memory management, and more that you really want to catch during development, not after deployment.
> With Docker, we can build a container image and use it at any stage of the deployment process.
Docker allows us to create a CI/CD flow that builds a new image containing all the necessary dependencies with each change to our code. This image can then be sent to a runner with GPU and used to test your application. It’s not the cheapest solution, so it may be worth dialing down the frequency of such tests.
Try Docker to Deploy Computer Vision Models
Docker gives you a quick way to start, iterate and focus on what matters most — your app. The deployment of models is complex enough, so don’t make it even harder. On top of that, when the time comes, running on the Cloud may be easier than you think.
If you plan to deploy a model trained on the Roboflow platform, use our official inference-server Docker image. We offer support for CPU and GPU machines. You will benefit from all the advantages of Docker, and above that, advanced Roboflow features such as active learning.