Semantic Segmentation vs. Instance Segmentation: Explained

Computer vision is the among the most compelling technologies of the 21st century as it has the potential to drive the world's transition to a better future.

There have been some notable comings and goings in the technological ecosystem of computer vision, but image segmentation is particularly notable. Image segmentation has endless applications. Today, image segmentation a core topic anyone working on computer vision projects should understand.

Today's article will dive deep into image segmentation, explain the two types of segmentations, and compare and contrast possible distinctions between instance and semantic segmentation. We will also discuss applications when semantic and instance segmentation comes into play.

Table of Contents:

  1. What is Image Segmentation?
  2. Semantic Segmentation: How it works and its applications
  3. Instance Segmentation: How it works and its applications
  4. Difference between Semantic Segmentation vs Instance Segmentation
  5. Conclusion

What is Image Segmentation?

Image segmentation is the task of identifying and classifying multiple categories of objects. But if you go deep into segmentation, it can get confusing, as there is a considerable difference between different types of segmentation and how they work.

Anurag Arnab, Shuai Zheng et. al 2018 “Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation”

This article explains segmentation's theoretical and abstract principles. In order to prepare image data for segmentation tasks, you'll need tools to label your data at the pixel level. If you're here to create a segmentation project, you can use Roboflow Annotate to apply smart polygon annotations and create your training dataset. Then, refer to our step-by-step guide on How to Train a Segmentation Model on a Custom Dataset.

What is Semantic Segmentation?

Semantic segmentation is a technique that enables us to associate each pixel of a digital image with a class label, such as trees, signboards, pedestrians, roads, buildings, cars, sky, etc. It is also considered an image classification task at a pixel level as it involves differentiating between objects in an image.

It is essential to understand that semantic segmentation classifies image pixels of one or more classes rather than real-world objects which are not semantically interpretable. Due to its intricate working scheme, it is a difficult task in the computer vision ecosystem as you classify each pixel instead of objects, which is the case in object detection.

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

How Does Semantic Segmentation Work?

Semantic segmentation aims to extract features before using them to form distinct categories in an image. The steps involved are as follows:

  • Analyze training data for classifying a specific object in the image.
  • Create a semantic segmentation network to localize the objects and draw a bounding box around them.
  • Train the semantic segmentation network to group the pixels in a localized image by creating a segmentation mask.
Note: The steps in semantic segmentation differ significantly from image classification, where we only assign a single class to the whole image.
User:X93ma - statwiki. (n.d.). Retrieved October 2, 2022, from https://wiki.math.uwaterloo.ca/statwiki/index.php?title=User%3AX93ma

Applications of Semantic Segmentation

  1. Medical Diagnostics: For detecting medical abnormalities in X-Rays, CT Scans, MRI Scans
  2. GeoSensing: For land usage mapping from satellite imagery and monitoring areas of deforestation and urbanization
  3. Autonomous Driving: For accurately detecting lanes, pedestrians, traffic signs, road, sky and other vehicles on the road

What is Instance Segmentation?

Instance Segmentation is a unique form of image segmentation that deals with detecting and delineating each distinct instance of an object appearing in an image. Instance segmentation detects all instances of a class with the extra functionality of demarcating separate instances of any segment class. Hence, it is also referred to as incorporating object detection and semantic segmentation functionality.

Instance segmentation has a richer output format as it creates a segment map for each category and instance of that class. Simply put, consider you have an image with dogs and cats. By running an instance segmentation model on that image, you can locate the bounding boxes of each dog and cat, plot segmentation maps for each dog and cat, and count how many dogs and cats are in the image.

Street view in Instance Segmentation

How Does Instance Segmentation Work?

Instance segmentation involves identifying boundaries of the objects at the detailed pixel level, making it a complex task to perform. But as we saw earlier, instance segmentation contains 2 significant parts:

  1. Object Detection: Firstly, it runs object detection to find all bounding boxes for every object in an image
  2. Semantic Segmentation: After finding all the rectangles (bounding boxes), it uses a semantic segmentation model inside every rectangle
Note: Instance segmentation only differentiates all instances in each class; for example, it will separate every person into a different class.
User:X93ma - statwiki. (n.d.). Retrieved October 2, 2022, from https://wiki.math.uwaterloo.ca/statwiki/index.php?title=User%3AX93ma

Applications of instance segmentation

Here are a few real-world applications of instance segmentation:

  1. Medical Domain: Used to detect and segment tumors in MRI scans of the brain and nuclei in images
  2. Satellite Imagery: Used to achieve a better separation between the objects, such as counting cars, detecting ships for maritime security, and sea pollution monitoring
  3. Self-Driving Cars: Used in conjunction with dense distance to object estimation methods to provide high-resolution 3D depth estimation of a scene from monocular 2D images
  4. Robotics: Used with self-supervised learning to segment visual observations into individual objects by interacting with the environment
  5. Automation: Used for detecting dents on a car, separating buildings in a city, and more

Semantic Segmentation vs. Instance Segmentation

Semantic SegmentationInstance Segmentation
1. For each pixel in the given image, it detects the object category it belongs to, where all object categories/ labels are known to the model.1. For each pixel in the given image, it identifies the object instance it belongs to. It dives deeper than semantic segmentation and differentiates two objects with the same labels.
2. Example: Semantic segmentation cannot distinguish between different instances in the same category, i.e. all chairs are marked blue.2. Example: Instance segmentation can distinguish between different instances of the same categories, i.e. different chairs are distinguished by different colours.
3. Firstly, target detection takes place, and then each pixel is labelled.3. It is a hybrid of annotation of target detection and semantic segmentation.
4. List of awesome open-source datasets is Stanford Background Dataset, Microsoft COCO Dataset, MSRC Dataset, KITTI Dataset, and Microsoft AirSim Dataset.4. List of awesome open-source datasets is LiDAR Bonnetal Dataset, HRSID (High-Dimension SAR Images Dataset), SSDD (SAR Ship Detection Dataset), Pascal SBD Dataset, and iSAID (A Large Scale Aerial Images Dataset).

Using Instance and Semantic Segmentation in Vision Applications

You now know the fundamental difference between instance and semantic segmentation, how each works, the possible areas of interest where it can be applied, and when to use them.

As a next step, put your knowledge into practice by training SegFormer, a new state of the art semantic segmentation algorithm, on a custom dataset!