Deploying Computer Vision Models as Mircroservices
Roboflow's philosophy around MLOps revolves around treating your computer vision model as a microservice. The reasons for this are myriad; in this post we highlight the benefits of this approach and how it works in practice.
Why Microservices?
Separation of Concerns
The primary motivation behind deploying your model as a microservice is that it separates concerns. Computer vision models can be finicky, requiring specific dependencies and sometimes even specialized hardware. You don't want to have to build your entire application around these specialized constraints.
With a microservice, your vision model can run the specific version of Ubuntu, Python, CUDA, and Tensorflow it needs (for example), while your application code is running in node.js, Go, or C#.
Scalability and Cost
In many situations, usage of your model is going to be bursty. It will sit idle waiting for something to trigger it then have a flurry of activity all at once.
For example, let's say your model is monitoring 20 security camera feeds at a worksite. You could connect each camera to a powerful machine that runs your model. But a better approach would be to connect each camera to a cheap, low-power device that speaks to a microservice running your model when motion is detected.
Speed of Iteration
With a microservice, updating your model doesn't mean re-deploying your entire app (which, in the case of a mobile app could mean days spent waiting for review, or with software deployed on physical hardware could take weeks or months).
Additionally, if you're working on a team, the folks responsible for your computer vision model can iterate independently of the application's overall release cadence.
How Does it Work with Roboflow?
Roboflow supports a standardized inference API for your models that works across several different platforms. You can test against our autoscaling Hosted API, then deploy the exact same model via a Docker container to devices like the NVIDIA Jetson, a server on your private cloud, or the Luxonis OAK.
When you train a new model, updating just means changing a reference to the version number in a configuration file. Or you can migrate users over slowly via a staged rollout to ensure that your new model performs just as well in the real world.