Train Test Split Guide and Overview

In order to ensure our models are generalizing well (rather than memorizing training data), it is best practice to create a train, test split. That is, absent rigor, our models can easily overfit to a small subset of examples we've collected. Look no further than Tesla using computer vision to identify stop signs – there is significantly more variation than one would anticipate.

Train, test, split (70/20/10) — A train, valid, and test split visualized in Roboflow.

By default, Roboflow prompts users to create train, valid, and test splits at the time of upload to encourage model building best practices. The default settings split a user's data into a 70 / 20 / 10 split: 70 percent of the examples are in the training set, 20 percent are in the validation set, and 10 percent are held out in the testing set.

Create a training, validation, and testing set for computer vision. — When uploading images, Roboflow prompts a user to create a train, valid, and test split.

However, there may be times where you seek greater control over exactly which images are in your training, validation, or testing set. In fact, Andrej Kapathy of Tesla spends as much time on test set curation as training set curation.

Adjusting splits in Roboflow is simple. When uploading data, a user can select which split the images in the current upload should be in the training, validation, or testing set.

0:00

/0:06

Select if the images should go in the training set, validation set, or testing set.

Once we've added images to one split in our dataset, we can select "Add More Images" to repeat the upload process, except we may select "Validation" or "Testing" for our next batch of uploaded images.

Add more images to a dataset in Roboflow. — On the righthand side, we can select "Add More Images" to expand a given image dataset.

As a bonus, if your images happen to be organized in Train, Valid, and Test folders locally and you drop these folders into Roboflow at upload, Roboflow will automatically detect this file structure organization at the time of upload.

0:00

/0:18

Detecting Train, Valid, Test folders and suggests the images are split according to Existing Values

Be sure to refer to the Roboflow documentation for additional tips!

Cite this Post

Use the following entry to cite this post in your research:

Joseph Nelson. (Oct 28, 2020). Train Test Split Guide and Overview. Roboflow Blog: https://blog.roboflow.com/train-test-split-with-roboflow/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Train Test Split Guide and Overview

Cite this Post

Discuss this Post

Joseph Nelson

Table of Contents

MORE ABOUT

Product Updates

Launch: Deploy Florence-2 with Roboflow

Launch: Roboflow Project Folders

Launch: Deploy YOLOv10 Models with Roboflow

Launch: Computer Vision Model Monitoring with Roboflow

Launch: Deploy YOLOv9 Models with Roboflow

Launch: Run Vision Models on Multiple Streams

Train Test Split Guide and Overview

Build and deploy with Roboflow for free

Cite this Post

Discuss this Post

Joseph Nelson

Table of Contents

MORE ABOUT

Product Updates

Launch: Deploy Florence-2 with Roboflow

Launch: Roboflow Project Folders

Launch: Deploy YOLOv10 Models with Roboflow

Launch: Computer Vision Model Monitoring with Roboflow

Launch: Deploy YOLOv9 Models with Roboflow

Launch: Run Vision Models on Multiple Streams