Semantic Segmentation vs. Instance Segmentation: Explained

Computer vision is the among the most compelling technologies of the 21st century as it has the potential to drive the world's transition to a better future.

There have been some notable comings and goings in the technological ecosystem of computer vision, but image segmentation is particularly notable. Image segmentation has endless applications. Today, image segmentation a core topic anyone working on computer vision projects should understand.

Today's article will dive deep into image segmentation, explain the two types of segmentations, and compare and contrast possible distinctions between instance and semantic segmentation. We will also discuss applications when semantic and instance segmentation comes into play.

What is Image Segmentation?

Image segmentation is the task of identifying and classifying multiple categories of objects. But if you go deep into segmentation, it can get confusing, as there is a considerable difference between different types of segmentation and how they work.

Anurag Arnab, Shuai Zheng et. al 2018 “Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation”

This article explains segmentation's theoretical and abstract principles. In order to prepare image data for segmentation tasks, you'll need tools to label your data at the pixel level. If you're here to create a segmentation project, you can use Roboflow Annotate to apply smart polygon annotations and create your training dataset. Then, refer to our step-by-step guide on How to Train a Segmentation Model on a Custom Dataset.

What is Semantic Segmentation?

Semantic segmentation is a technique that enables us to associate each pixel of a digital image with a class label, such as trees, signboards, pedestrians, roads, buildings, cars, sky, etc. It is also considered an image classification task at a pixel level as it involves differentiating between objects in an image.

It is essential to understand that semantic segmentation classifies image pixels of one or more classes rather than real-world objects which are not semantically interpretable. Due to its intricate working scheme, it is a difficult task in the computer vision ecosystem as you classify each pixel instead of objects, which is the case in object detection.

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

How Does Semantic Segmentation Work?

Semantic segmentation aims to extract features before using them to form distinct categories in an image. The steps involved are as follows:

Analyze training data for classifying a specific object in the image.
Create a semantic segmentation network to localize the objects and draw a bounding box around them.
Train the semantic segmentation network to group the pixels in a localized image by creating a segmentation mask.

Note: The steps in semantic segmentation differ significantly from image classification, where we only assign a single class to the whole image.

User:X93ma - statwiki. (n.d.). Retrieved October 2, 2022, from https://wiki.math.uwaterloo.ca/statwiki/index.php?title=User%3AX93ma

Applications of Semantic Segmentation

Medical Diagnostics: For detecting medical abnormalities in X-Rays, CT Scans, MRI Scans
GeoSensing: For land usage mapping from satellite imagery and monitoring areas of deforestation and urbanization
Autonomous Driving: For accurately detecting lanes, pedestrians, traffic signs, road, sky and other vehicles on the road

What is Instance Segmentation?

Instance Segmentation is a unique form of image segmentation that deals with detecting and delineating each distinct instance of an object appearing in an image. Instance segmentation detects all instances of a class with the extra functionality of demarcating separate instances of any segment class. Hence, it is also referred to as incorporating object detection and semantic segmentation functionality.

Instance segmentation has a richer output format as it creates a segment map for each category and instance of that class. Simply put, consider you have an image with dogs and cats. By running an instance segmentation model on that image, you can locate the bounding boxes of each dog and cat, plot segmentation maps for each dog and cat, and count how many dogs and cats are in the image.

Street view in Instance Segmentation

How Does Instance Segmentation Work?

Instance segmentation involves identifying boundaries of the objects at the detailed pixel level, making it a complex task to perform. But as we saw earlier, instance segmentation contains 2 significant parts:

Object Detection: Firstly, it runs object detection to find all bounding boxes for every object in an image
Semantic Segmentation: After finding all the rectangles (bounding boxes), it uses a semantic segmentation model inside every rectangle

Note: Instance segmentation only differentiates all instances in each class; for example, it will separate every person into a different class.

User:X93ma - statwiki. (n.d.). Retrieved October 2, 2022, from https://wiki.math.uwaterloo.ca/statwiki/index.php?title=User%3AX93ma

Applications of instance segmentation

Here are a few real-world applications of instance segmentation:

Medical Domain: Used to detect and segment tumors in MRI scans of the brain and nuclei in images
Satellite Imagery: Used to achieve a better separation between the objects, such as counting cars, detecting ships for maritime security, and sea pollution monitoring
Self-Driving Cars: Used in conjunction with dense distance to object estimation methods to provide high-resolution 3D depth estimation of a scene from monocular 2D images
Robotics: Used with self-supervised learning to segment visual observations into individual objects by interacting with the environment
Automation: Used for detecting dents on a car, separating buildings in a city, and more