YOLO was designed exclusively for object detection. However, it has proven influential in the creation of high-speed image segmentation architectures such as YOLACT.
The recently released YOLOv7 model natively supports not only object detection but also image segmentation. Using this technique, you can locate objects in a photo or video with great precision.
This tutorial will show you how to leverage this latest iteration of the YOLO model to perform concrete crack instance segmentation. Let's begin!
Step 1: Setting up a Python Environment
Before we train our custom model, we must ensure that we can access the GPU. Although CPU training is possible, it is inefficient and time-consuming, especially for instance segmentation as it is more resource-demanding than object detection.
nvidia-smi to confirm that everything works as expected. If this command returns an error, try to enable the GPU acceleration for your environment. This process may vary depending on the platform you are using (i.e. between different hosted notebooks). For Google Colab, you can enable GPU acceleration by clicking
Notebook settings →
Hardware accelerator and set it to
To make it easier to manage the file and script paths later in the tutorial, let's create a
HOME constant that will store the location of the root directory.
Now we are ready to install YOLOv7 and its dependencies. We start by cloning the official repository and then immediately change the git branch from
u7. Unlike YOLOv5 - where all supported computer vision tasks are available in the same codebase - in the case of YOLOv7, each task is stored on a separate branch. Instance segmentation can be found on
u7. Where did the descriptive name come from? No clue! That's what I call proper engineering practices.
To ensure no breaking changes are introduced, we check out a specific commit from the instance segmentation branch. It is represented by the long
44f30a... hash in the code snippet below. At this point, the only thing left is to navigate to the
seg subdirectory and install all the dependencies listed in
Step 2: Inferring with a Pre-Trained Model
One of the best ways to test whether the installation of the environment was successful is through test inference. In our case, we will use the YOLOv7 instance segmentation model pre-trained on the COCO dataset. Let's download weights from the GitHub repository first and create
WEIGHTS_PATH constant, to store the path to that file.
Now we can use
predict.py script to load the model into the memory and perform inference on the selected image or video. The results will be saved in the
Accuracy vs Speed Trade-off
The trade-off between accuracy and speed is common in Computer Vision and applies to many different types of models, not just those used for instance segmentation. More accurate models tend to be slower, as they require more computations to make predictions. On the other hand, faster models tend to have lower accuracy, as they are making fewer computations and may not be able to capture as much detail.
For example, OneFormer is a model that is known for its high accuracy across all segmentation tasks, but it also requires a larger number of computations, which makes it slower to use. On the other hand, YOLOv7 has a lower accuracy, but it's a lot faster. The trade-off between accuracy and speed can be critical when selecting a model for a particular task. It will determine the balance between the model's performance and the resources required to run it.
If you are looking for a fast and efficient instance segmentation model for a real-time use case YOLOv7 is a great candidate.
Step 3: Preparing a Custom Dataset for Instance Segmentation
I found my dataset by browsing through Roboflow Universe. Lucky guy! I did not need to gather images, label them, and convert annotation formats. If you are working on something truly original you will probably have to start from scratch. Nevertheless, Roboflow makes this process as straightforward as possible.
Create Instance Segmentation Dataset
Create a new project in the Roboflow dashboard and select Instance Segmentation as the Project Type.
Next, add the data to your newly created project. You can do it via API or through our web interface. If you're using a dataset from Roboflow Universe as a starting point, you can download the data with the annotations already done for you (assuming the dataset you are using contains annotated images). This is a great option as it minimizes the amount of manual annotation you have to do yourself.
If you drag and drop a directory with a dataset in a supported format, the Roboflow dashboard will automatically read the images and annotations together. If you only have images, you can label them in Roboflow Annotate.
When labeling for instance segmentation tasks, it's important to use polygon annotations. This is key because we want the model to learn the precise shape of each object (as opposed to object detection, where a bounding box around an object is sufficient).
After labeling the data, we can apply preprocessing and augmentation to increase the size of our dataset and account for cases that may give our model difficulty predicting the object.
Export Your YOLOv7 Instance Segmentation Dataset
One of the most convenient ways to download your dataset from Roboflow Universe is to use our pip package. You can generate the appropriate code snippet directly in our UI. On your dataset's Universe home page, click
Download this Dataset button and then select YOLO v7 PyTorch export format.
After a few seconds, you will see a code similar to the one below, except with all the necessary parameters filled in. You can simply copy and paste it into your Jupyter Notebook. When you execute it, the dataset will be downloaded to your machine in the appropriate format. Magic!
YOLOv7 Instance Segmentation Dataset Structure
After downloading the YOLOv7 dataset, we can take a quick look at its file structure. The directory contains images and labels divided into three subsets: train, test, and validation. In addition, there will be a
data.yaml file in the dataset root directory.
HOME/ └── dataset-name/ ├── test/ │ ├── images/ │ │ ├── image-0.jpg │ │ ├── image-1.jpg │ │ └── ... │ └── labels/ │ ├── image-0.txt │ ├── image-1.txt │ └── ... ├── test/ │ ├── images/ │ │ └── ... │ └── labels/ │ └── ... ├── valid/ │ ├── images/ │ │ └── ... │ └── labels/ │ └── ... └── data.yaml
Each label file must be in
.txt format and have the same name (except for the extension) as the corresponding image. Each line represents a separate polygon and has the following structure
class_index p1.x p1.y p2.x p2.y p3.x p3.y ....
class_index is a number between
n-1 representing the position of the given class name in the class list.
p3 are consecutive points forming the polygon.
There are dozens of different Computer Vision annotation formats. If you want to learn more about them visit our formats directory where we talk about each of them in detail and show how you can convert data between different formats.
Step 4: Train a Model on a Custom Dataset
The most difficult part of the task is behind us - YOLOv7 is installed and the custom dataset is created. Now we're ready to start training our model. Before we kick off, let's take a moment to consider the values of the parameters we pass. Most notably, we should pay attention to
img-size. There are many more but these three are crucial to training performance.
epochs refer to the number of times the model will cycle through the data during training. The
batch-size is the number of samples per gradient update, and the
img-size is the dimensions of the input images.
If you have a dataset of 10,000 images and an
batch-size of 100, it will take 100 gradient updates to complete 1 epoch. On the other hand, the
img-size determines how many pixels the model has to process for each image. Increasing the
img-size can improve model performance but may also increase training time and require more computational resources.
As with inference, the results of the training -- particularly the weights -- are stored in the
runs catalog. Above that, you will find graphs illustrating the change in key metrics across training epochs and the inference results of a freshly trained model on selected images from a validation set.
Step 5: Evaluating the Model
We evaluate deep learning models on a test dataset to measure their generalization performance. This refers to how well the model can predict outcomes for new, unseen data. This is important because we want to ensure that the model has learned meaningful relationships present in the data that can be applied to unseen data (data not used in training).
Test images are usually selected by randomly taking a sample of the available data and excluding it from the training process. This allows us to evaluate the model's performance on data it has not seen during training and gives us a better idea of how the model will perform on real-life data. It is essential to carefully select and design the test data set to ensure that it is representative of the types of data the model will encounter in the real world.
At Roboflow, we let you choose the proportion of your train, test, and validation subsets. We handle the data splitting process, so you don't have to worry about splitting data yourself.
Instance segmentation is a computer vision task that finds applications in many fields, from medicine to autonomous cars. This tutorial will allow you to train your own model that precisely detects cracks in an image.
Using this guide, you can build a model that identifies structural damage in buildings or bridges. Models like that could be helpful for insurance companies to measure risk, for building inspectors, and for many others.
Use the code in our Notebook to bootstrap your project and unleash your creativity. Most importantly, let us know what you've been able to build.