When building a computer vision application from scratch, two of the most time consuming parts of the process are finding data to train the model and quickly testing the performance of the model.
As of today, Roboflow is making that easier for all users by supporting YouTube video links for uploading data and testing model performance in the browser. Whether you’re building a proof of concept use case, kickstarting a new model without proprietary data, or expanding your dataset to improve model performance, you can leverage YouTube’s 800+ million videos to get to value faster.
Using YouTube Videos for Computer Vision Training Data
Videos provide an easy way to gather thousands of unique images for training data because we can divide the video frames into distinct images. Using videos as training data is helpful for both image and video inference computer vision applications so even if you’re not building a computer vision model for video inference, you can benefit from the ease of converting videos to images for training purposes.
To use a YouTube video for training data, head to the Upload page, copy and paste the YouTube video URL into the Import field, and choose the rate you want images sampled from the video. You’ll see the number of images you will generate and make available to label before adding to your dataset.
From this point, you can use Roboflow Annotate to label the data, generate your dataset, and then use Roboflow Train to create a model. For this project, we chose to use the Smart Polygon tool to apply precise labels and help increase the accuracy of our model.
Our dataset consists of only 58 annotated images and we used augmentations to expand the dataset to 418 images for training the model. By using a small amount of well labeled data, we can create a highly performant model. When it comes to specific tasks, you do not always need thousands of images to get to value.
Testing Computer Vision Model Performance with YouTube Videos
Once you have a trained model, you can test the performance of the model using the Deploy page in Roboflow to see how well the model performs on the video itself. By following labeling best practices, you should expect your model to perform well on the video you used to create the training data.
Your mAP score and precision and recall are helpful indicators of how your model will perform but seeing inferences populate in real-time while adjusting the confidence or overlap thresholds will give you an idea of where your model is having issues or if it's working well enough to deploy into production.
Using YouTube videos to train a model is a great way to prove a use case or help show other stakeholders what you can do with computer vision models.
Your trained model can now be tested on other videos and see if new environments create edge cases you would want to account for in the next version of your model. For example, using a different clip (below) shows the model performs well with new greyhounds, new numbers, new colors, and new camera angles.
But, it looks like the head-on view and logo in the upper left corner gave the model trouble. We should upload that data and use it to generate a new dataset to train a new model. At this point, you can continue adding more YouTube videos to collect more data and improve your models performance to better handle new environments until it's ready to be deployed into production.
In scenarios where you have multiple fixed location cameras, you can very quickly build highly performant models this way. This is a process known as active learning and it’s an important method for improving model performance over time.
800+ Million Videos for Training Data and Inference Testing
Roboflow is on a mission to democratize computer vision and making it easier to collect training data allows us to put computer vision into the hands of more developers. All features highlighted are included in our free Public tier and available for all users.