Using the Upload API to Collect Images from the Wild
The key to production quality machine learning models is continuous iteration and improvement. The first step is getting a model that is "good enough" for your first version. But once you deploy to the real world you'll invariably find edge cases that confuse your model.
Collecting data from the field to continually iterate and improve your model over time it incredibly important. Deploying your first model is the starting point; not the finish line. As your model gets better your product will see wider usage which will help you collect more data and improve your model further in a virtuous cycle.
This is how data moats are built and can form a lasting competitive advantage as you scale. You can read more about the process in this post about active learning.
The Roboflow Upload API
Roboflow is here to help with that process of continuous improvement. Today we're launching our Upload API to general availability for all users. By integrating the upload API into your app you can add images to your datasets directly from your deployed applications.
Getting started is easy, just head over to the API section of your account page and create your API key.
Treat this key like a password because it can be used to access and modify data on your Roboflow account. (We will be adding more API functionality as time goes on.)
Then, create a dataset inside of Roboflow to hold your uploaded images. Keeping uploaded images separate from your production data will allow you to easily keep track of which images need to be labeled and reviewed. It will also let you filter out junk images without polluting your main dataset. Grab the dataset's identifier from the URL; this is how you'll tell the Upload API which dataset you'd like to add your images to.
Find the code snippet for your programming language of choice in our Upload API documentation and add it to your application. We recommend giving your users a way to flag when your model is performing poorly so that you can collect examples of the specific edge cases that will most improve your model. But uploading a random sampling of real-world images is better than nothing and can be a good first step.
Then, every so often, use Roboflow Annotate to label a batch of images, merge them into your main dataset, train a new version of your model, and push it to production. Your model will continue to improve each time you go through this cycle.
We've been testing this flow in private beta for the past few months and already have several customers using the upload API in the wild as a core component in their computer vision workflow. We're excited to see what you'll build!