Launched in April 2023, Meta AI released the Segment Anything Model (SAM) and on July 29th, 2024, Meta AI released Segment Anything 2 (SAM 2), a new image and video segmentation foundation model. According to Meta, SAM 2 is 6x more accurate than the original SAM model at image segmentation tasks.
Given an input image or video, the SAM models can segment objects in the image and generate segmentation masks. Then, you can use models to make use of that information, such as image to text models for generating masks of objects you specify.
Labeling with SAM models is available Roboflow Annotate, our tool for annotating images for computer vision tasks. We also have a deep dive covering how to use SAM in Python if you are interested in getting an in-depth guide using SAM in code.
With SAM setting such a high standard for segmentation, we wanted to take a step back and ask: what are the main use cases for SAM and SAM 2? How can you use the family of SAM models to help you solve problems?
Those are the two questions we’re going to answer in this guide. Below, we walk through five use cases for SAM models. Without further ado, let’s get started!
Assisted Image Labeling
You can use a model you have already trained for a specific task in tandem with SAM to provide an assistant that recommends annotations to add to your images. This allows you to create polygon annotations without having to click on individual points around a polygon. With SAM, you can click on an object of interest and click to refine your annotation if and as necessary.
Roboflow Annotate integrates SAM out-of-the-box for paid customers, allowing you to annotate with greater precision and speed.
Zero-Shot Labeling
Zero-shot labeling refers to annotating images from previously unseen images.
For example, you can feed SAM a model of cars on a road and SAM will be able to recommend segmentation masks for all of the cars, as well as everything else in the image. With that said, the masks will not come with annotations that tell you that the cars are cars. This is because SAM segments images. It does not detect images in the way that zero-shot object detectors like Grounding DINO do.
You would then need to feed the output masks from SAM through a zero-shot object detection model like Grounding DINO that would find all of the cars. From there, you could add labels to each of the masks of interest in your image. In the aforementioned example, you could send label only masks that Grounding DINO reports contain cars to your dataset.
Removing Backgrounds
SAM can identify backgrounds in images with a great degree of precision. When you use SAM, you can interactively select a mask for the background. Then, you can use that information to remove the existing background from an image and replace it with a transparent background. You could then place the new image on top of a new background.
One situation in which this feature is helpful is photo editing. Consider a scenario where you have an image of a person whose background you want to change (i.e. because you want to add a coloured gradient background behind the person). You could retrieve the pixels associated with a person in the image and then add a custom background.
Inpainting
The degree of accuracy with which SAM identifies boundaries around objects makes the model an ideal partner for inpainting in image generation. Using a model like SAM, you could find the exact features of an image that you want to change, then send the masks through a model that supports inpainting like Stable Diffusion.
Consider an example where you want to replace all of the blue cars in a parking lot image with red cars, a task you may want to do if you are building a model to detect cars. You could use SAM to identify all the cars, select the masks that contain blue cars, then provide each mask as a prompt to an inpainting model. Then, you can make a request like “change the color of the car to blue” to get your desired output.
Synthetic Data Generation
As aforementioned, you can use SAM in combination with a zero-shot object detection model like Grounding DINO. When you have masks that represent objects of interest, you could then paste them onto images with new backgrounds relevant to the environment where your model will be deployed. This will help your model learn to better identify features in your dataset.
In addition, you could use the inpainting feature for synthetic generation. In the inpainting example, we noted that you could change the color of cars in a parking lot to help you make your model more representative of the environment in which it will operate. Let’s use another example. Suppose you are identifying defects on metal pipes. You could use SAM to identify the metal pipe in an image, then ask a model that supports inpainting to add a scratch or a dent or another defect that you want to be able to detect.
Conclusion
At the time of writing, SAM has been out for around a week. We’re only at the beginning of exploring what is possible with this model. There are more applications to be explored with generative AI, zero-shot labeling, image captioning, and more.
If you are interested in what SAM has to offer, we recommend playing around with the model yourself! If you have images to label, Roboflow now supports SAM in the browser. With this feature, you can annotate images faster and with a greater degree of precision.
Cite this Post
Use the following entry to cite this post in your research:
James Gallagher. (Apr 14, 2023). Top 5 Use Cases for Segment Anything Model (SAM). Roboflow Blog: https://blog.roboflow.com/sam-use-cases/
Discuss this Post
If you have any questions about this blog post, start a discussion on the Roboflow Forum.