Train Test Split Guide and Overview

In order to ensure our models are generalizing well (rather than memorizing training data), it is best practice to create a train, test split. That is, absent rigor, our models can easily overfit to a small subset of examples we've collected. Look no further than Tesla using computer vision to identify stop signs – there is significantly more variation than one would anticipate.

A train, valid, and test split visualized in Roboflow.

By default, Roboflow prompts users to create train, valid, and test splits at the time of upload to encourage model building best practices. The default settings split a user's data into a 70 / 20 / 10 split: 70 percent of the examples are in the training set, 20 percent are in the validation set, and 10 percent are held out in the testing set.

When uploading images, Roboflow prompts a user to create a train, valid, and test split.

However, there may be times where you seek greater control over exactly which images are in your training, validation, or testing set. In fact, Andrej Kapathy of Tesla spends as much time on test set curation as training set curation.

Adjusting splits in Roboflow is simple. When uploading data, a user can select which split the images in the current upload should be in the training, validation, or testing set.

0:00
/0:06

Select if the images should go in the training set, validation set, or testing set.

Once we've added images to one split in our dataset, we can select "Add More Images" to repeat the upload process, except we may select "Validation" or "Testing" for our next batch of uploaded images.

On the righthand side, we can select "Add More Images" to expand a given image dataset.

As a bonus, if your images happen to be organized in Train, Valid, and Test folders locally and you drop these folders into Roboflow at upload, Roboflow will automatically detect this file structure organization at the time of upload.

0:00
/0:18

Detecting Train, Valid, Test folders and suggests the images are split according to Existing Values 

Be sure to refer to the Roboflow documentation for additional tips!

Build and deploy with Roboflow for free

Use Roboflow to manage datasets, train models in one-click, and deploy to web, mobile, or the edge. With a few images, you can train a working computer vision model in an afternoon.