Computer vision models learn to model a task from a training set, however, like all deep learning models, they are prone to overfit the data they have been shown, making poor inferences in the wild.

Computer vision models, like all machine learning models, tend overfit to the training data
Overfitting means worse performance on data the model has never seen. Learn more on Train/Test splits.

Enter Occlusion Techniques

In this post, we take an overview of occlusion techniques in computer vision and discuss how occlusion based data augmentation techniques can be used to combat the problem of overfitting in computer vision.

What is Occlusion in Computer Vision?

Occlusion techniques in computer vision block a portion of an image during training time, challenging the network to learn not to rely canonical features.

For example, if we were to train an object detection model to detect dogs, we might only have training data with the dog's head in view. Looking at the Class Activation Map - CAM (the pixels that contribute the most to a prediction), we might see that the network relies heavily on the dog's head to make predictions.

(cite)

But what about when the dogs head is behind a bush? We want our network to be able to generalize to situations like this, so it may be advantageous to occlude some training images to hide the dog’s head and force the network to lead to identify a dog from other parts of its shape.

The model above would not do so well if the dog's head is missing (cite)

Predecessor Occlusion Techniques

Research in occlusion techniques for training deep learning computer vision models is not brand new, and started as early as 2017.

Random Erase - A random rectangular section of the image is erased and replace with noisy pixels. Typically, in a training pipeline random erase is implemented with a random location and random width height within a set range. Random erasures can also be applied probabilistically.

(cite)

Cutout - random squares are cutout from the image. In the original cutout paper, this augmentation is implemented by only hiding these pixels from the first layer of the CNN, meaning downstream connected layers could still peak at the hidden pixels.

(cite)

Hide and Seek - Divide the image into a grid, and randomly hide sections of the grid with some probability. This is similar to grid mask but with random grids being removed.

(cite)

Grid Mask - Draw a grid over the image and hide all squares of the grid.

(cite)

State of the Art Occlusion Techniques

New techniques have grown out of the old techniques, pushing the state of the art in computer vision modeling with data augmentation.

CutMix - A section of the image is randomly cut out and replaced with a section from a different image. This forces the model to not only predict around occlusion, but to learn not to rely too heavily on the surrounding environment it typically finds a given object, or class label in.

The CutMix authors showing how CutMix can work better than similar techniques. (cite)

Mosaic - Mosaic is not an occlusion technique directly, but it accomplishes a similar goal. Mosaic stiches together 4 images in a window, randomly shifting and cropping them along the way. This forces the model to learn around occluded objects on the side, learn to identify objects within different contexts, and learn to identify objects in different portions of the image, improving resiliency.

(cite)

A Hands On Occlusion Example (Chess)

Let's suppose we want to train a an object detection model to recognize chess pieces. We have gathered a chess dataset and made our chess dataset public on Roboflow.

With limited training data, it is possible that our model will only see chess pieces in a non-occluded fashion like this image:

A non-occluded chess image in our training set

Yet, at inference time, our model may need to make predictions on chess pieces that are occluded like this image:

Occluded chess pieces in our testing set

Therefore, it may be advantageous to experiment with adding some occlusion augmentations to our training data to improve our model's resilience.

Experimenting with adding occlusion to our training images in Roboflow

Conclusion

Occlusion techniques can be used to improve the resiliency of your computer vision model, teaching your model to generalize around canonical features of an image.

State of the art computer vision research continues to push the boundary of effective occlusion data augmentation strategies.

Occlusion is an important data augmentation strategy to explore if you believe your model is overfitting to certain features in your training set.

Happy occluding, and as always, happy training.