Revamping Train, Validation, Test, Split Management
Published Nov 20, 2020 • 2 min read

Splitting data into train, validation, and test splits is essential to building good computer vision models. We are excited to announce in-app changes to Roboflow that make it even easier to manage your train test splits as you are working through the computer vision workflow.  

Splitting Data on Upload

As before, you will be able to split your dataset into train, validation, and test splits in the upload flow. You can choose to keep the same splits that your folder structure reveals, or randomly shuffle images between train, validation, and test splits.

Changing splits at upload time

Changing splits on upload is a start, but many users requested the ability to change train/validation/test splits within a Roboflow dataset. We heard you! We're excited to to announce a solution.

Splitting Data within the Dataset Page

We thought splitting your dataset into different splits should be as easy as making other modifications to your dataset. So now, in the Modify Dataset tab, underneath Train/Test Split there is a button to Edit Splits. To change your splits simply toggle that button and pick the desired split distribution. Once the split looks good, hit Save Splits and the new dataset splits will be saved to the database.

Changing splits during dataset modification

Notes on in-app split changing

The new split will be reflected in future version exports. and remember augmentations only multiply on the training set). Past dataset versions do not change, as they are to be a source of record.

Images are moved between splits to satisfy the difference between desired and current. The rest of your dataset is not randomly reshuffled.

If you want to curate a present a specific testing set, the way to do that is still to organize images outside of Roboflow and upload with option Add all Images to Testing Set.

When to edit your splits

It is tempting to move as many images to the training set as possible to show your model the most variety it can see. However, this significantly degrades the quality of evaluation metrics you have to see how well your model is learning. At Roboflow, we recommend a 70% train/20% validation/10% train split to get the most out of your training set while getting a good look at evaluation metrics.

Conclusion

There's a new way to edit your train/validation/test split in-app in Roboflow. Use it wisely.

Happy splitting, and as always, happy training!

Cite this Post

Use the following entry to cite this post in your research:

Jacob Solawetz. (Nov 20, 2020). Revamping Train, Validation, Test, Split Management. Roboflow Blog: https://blog.roboflow.com/revamping-train-test-split-management/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Written by

Jacob Solawetz
Founding Engineer @ Roboflow - ascending the 1/loss