What Does "End to End" Really Mean?
Developing, deploying and optimizing computer vision models used to be a cumbersome, painful process. With Roboflow, we sought to democratize this technology, which (first and foremost) meant knocking down the barriers that we perceived were preventing everyday people from exploring and implementing computer vision in their work and daily lives.
The natural result of this undertaking was a true “end-to-end” solution, a product that enables users to start with a set of raw images, and in the span of an afternoon, create a fully trained computer vision model. The only necessary ingredients to this process are a laptop, a wifi connection and a bit of curiosity. (You don’t even need source images of your own, we offer entire datasets of public images that you can “fork” into your Roboflow account to get started.)
Once a model has been trained and deployed, we give our users the tools they’ll need to improve model performance iteratively - by sending inference data back into Roboflow to be annotated, trained and redeployed. This process is called Active Learning, and it’s the final step in closing the machine learning loop.
Getting Started
There are a number of steps necessary to create a functioning computer vision model. First, you need images, and although this might be surprising to some, we’ve learned that quality is more important than quantity on this front.
The best source images are representative images - meaning:
- They contain the object(s) you’re hoping to teach your model to detect, and
- they closely resemble the context in which those images will be “seen” in deployment.
A little clarity on this second point: if you’re training a model to detect fish underwater, for example, your source images should include primarily pictures of fish underwater - not above-ground, in broad daylight, on a boat or hanging from a line.
Annotation
When you step outside your front door each morning, your brain instantaneously (and subconsciously) recognizes all the objects around you - a tree, a sidewalk, a passing car. These inputs are vast and varied, and you can recognize them even in varying contexts and conditions. As we set out to create a computer vision model, we undertake the ambitious task of teaching a computer to see the world as we see it - and so the next logical step in our process is to annotate, or label, the images in our dataset.
Annotation is the human process by which images are labeled, or assigned meta-data tags. This process is key to developing your computer vision model; it’s how we guide and instruct the model to learn.
At Roboflow, we offer bounding box annotations. As you can see from many of the images on our website, this looks like a brightly colored rectangular outline of the object(s) you’re hoping to teach your model to detect. Our annotation tool is user-friendly, accessible for teams or individuals, and contains a number of time-saving features to make this notoriously labor-intensive step a whole lot easier, including Label Assist, one of our most popular features on the Pro plan.
Generate a Version
In preparation to train a model, you’ll want to preprocess and augment the images in your dataset. Preprocessing steps are designed to help your model train and run inference faster (for example, resizing all images so they’re uniform, or making them all greyscale) - while augmentation steps are meant to improve model resiliency in deployment by showing it examples of images with various distortions that simulate conditions it may encounter in the real world. Examples of these conditions include brightness, hue and rotation.
Adding augmentation to those source images inevitably means increasing the size of the dataset; we’re copying those pictures and altering them just slightly - just so - to help the resulting computer vision model understand that a fish is a fish, even in murky waters.
Each variation of your dataset creates a new “version” in Roboflow. These versions are frozen in time; each one a snapshot that allows you to experiment with different models, frameworks, and hyperparameters without inadvertently changing other variables that could invalidate your results.
Train a Model
Finally, it’s time to train a model. This is the big dance. We offer the ability to export your datasets for use with your own custom models in all of the common machine learning frameworks and code-free integrations to external training pipelines like AWS Rekognition, Google AutoML and MS Azure, but we encourage our users to try Roboflow Train, an option built directly into Roboflow.
Roboflow Train is an automated machine learning solution that our customers can use to transform any dataset into a trained computer vision model, ready for deployment. There are a number of benefits to using Roboflow Train, including improved model performance, the simplicity of a single-click option, and the aforementioned Label Assist feature.
Roboflow Train facilitates the fluidity (and automation) of developing an Active Learning pipeline, such that model performance is improved iteratively over time using inference data. More on this a bit further down.
Streamlined Deployment
Start getting predictions from your model on real world images by deploying through Roboflow. This is the (most) fun part - especially if you get the opportunity to show how the model performs to family, friends or colleagues. You’re welcome to use any of our deployment destinations available within Roboflow, including hosted web inference and on-device deployment (NVIDIA Jetsons, OAK devices, or even a web browser).
Here are a few models in the wild that you can check out today:
(Okay, this one is a little touchy.)
This model doesn't just "see" hands - it recognizes the common gestures for rock, paper and scissors.
Have a mask handy? See if this model knows when you're wearing it...and when you're not.
Active Learning
Models in the wild are exposed to real-world inference data that will inevitably differ from the images they were trained on. It’s important to monitor and feed these failure cases back into your training pipeline so it gets better over time.
The process of collecting, annotating and re-training inference data is called “Active Learning” - and in combination with Transfer Learning (or training a model from a previous model’s checkpoint), our customers are empowered to create computer vision models that learn iteratively, and eventually outperform their human counterparts by every measure: speed, accuracy and consistency.
Models, like people, can have blind spots - and the classes (or objects) that are most likely to be misidentified by your model are simultaneously the highest leverage images to feed back into Roboflow to help it improve.
In this way, Active Learning is an ongoing process, which is why Roboflow is a solution designed to be used not once, but over a long period of time to create the highest performing models in the industries we serve.
Let’s Build
Are you ready to give it a whirl? Create a free Roboflow account today, and upload (or fork) your first dataset of images. Contact us to set-up a personalized demo of the above steps, or to receive specific guidance around your unique use-case. Alternatively, you're welcome to tune into our weekly live demo (register here) to see these steps in-action. You can own a piece of this incredible technology - Roboflow exists to power your efforts, every step of the way - from end-to-end.