How to use MobileNetV4 for Classification
Published Jul 22, 2024 • 3 min read

MobileNetV4 is a state-of-the-art convolutional neural network architecture designed for efficient mobile and edge device performance, offering a balance between high accuracy and low computational cost.

MobileNetV4 was developed by Apple. With that said, Apple have not yet released pre-trained weights using the architecture. Hugging Face, however, have trained their own weights using the MobileNetV4 architecture. The Hugging Face weights achieve strong accuracy on classification tasks, so we will use them in this guide to demonstrate MobileNetV4 in use.

In this article, we will guide you through the process of using MobileNetV4 for classification. We will focus on using pre-trained weights in this guide. We will not cover fine-tuning your own model.

Without further ado, let's get started!

What is MobileNetV4?

MobileNetV4 is an image classification model developed by Apple. It is an iteration of the model family MobileNetV4, which are computer vision models that are built for smaller (mobile) applications. The model was trained on the ImageNet-1k dataset, and presents SOTA results despite its lack of size and resources. 

On different mobile devices, MobileNetV4 outperforms many other image classification models in terms of accuracy, from MobileNetMultiAvg to FastViT. MobileNetV4 achieves up to 75% reduction in the number of parameters used, significantly lowering its size. The model is also 3-4 times faster than previous mobileNet models as well as other light weight models. 

Step #1: Download libraries

First, we need to download the dependencies that we will use in our project. Run the following command to install the dependencies:

!pip install transformers timm torch pillow

Step #2: Import libraries

Next, add the following lines of code to import the libraries we need.

from urllib.request import urlopen
from PIL import Image
import timm
import torch

Step #3: Get the Image

This step will help us get the image through a link using Pillow, one of our installed libraries. 

test_img_url1 = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
test_img1 = Image.open(urlopen(test_img_url1))
test_img1

In the code snippet above, add a url to the image you need. Run the code in colab to view the image. 

Step #4: Get Labels

ImageNet4V was trained on Imagenet-1k. Therefore, we need the labels from Imagenet. Gather them from this GitHub

The labels gathered will consist of many different categories. They range from animals like the red fox, to objects like kites and cappuccinos. However, most are animal-related.

Link the dictionary to a variable called “image_net_labels”.

Step #5: Build detection model

In this step, we build a function that will run MobileNetV4 on any image.

def predict_with_ImageNetV4(img):
 # Load the pretrained MobileNetV4 model from timm
 model_name = "hf_hub:timm/mobilenetv4_hybrid_large.ix_e600_r384_in1k"
 model = timm.create_model(model_name, pretrained=True)
 model = model.eval()
 # Get model specific transforms (normalization, resize)
 data_config = timm.data.resolve_data_config({}, model=model)
 transform = timm.data.create_transform(**data_config)

 # Apply the transforms and add batch dimension
 input_tensor = transform(img).unsqueeze(0)

 # Forward pass through the model
 with torch.no_grad():
     output = model(input_tensor)
 # Get the top-5 probabilities and class indices
 top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1), k=5)

 # Convert probabilities to percentages
 top5_probabilities = top5_probabilities * 100

 # Print the top-5 probabilities and class indices
 print("Top-5 probabilities:")
 print(top5_probabilities)

 print("Top-5 class indices:")
 print(top5_class_indices)

 top5_class = top5_class_indices[0]
 list_form_c = top5_class.tolist()

 top5_prob = top5_probabilities[0]
 list_form_p = top5_prob.tolist()

 # map the class indices to the related word in image_net-1k
 predictions=[]
 for i in range(5):
   predictions.append([image_net_labels[list_form_c[i]], round(list_form_p[i], 2)])
 print(predictions)

The code above:

  • Uses timm to load the model from HuggingFace hub
  • Gets output from the model
  • Retrieves the data and gets the first 5 classes and probabilities
  • Map the classes to the labels

Step #6: Test the model

Use the function below to run inference on various different images. Here, I call the function on test_img1, which will give the resulting image:

predict_with_ImageNetV4(test_img1)

Based on ImageNetV4, the model accurately predicts the image to be an espresso, with a confidence level of around 58%. 

Conclusion

In this guide, we demonstrated how to use MobileNetV4 using an unofficial version of the model implemented by HuggingFace. By following the steps above, you can use MobileNetV4 for classification tasks. 

You can check out some notebooks and tutorials for more content on models.

Cite this Post

Use the following entry to cite this post in your research:

Nathan Yan. (Jul 22, 2024). How to use MobileNetV4 for Classification. Roboflow Blog: https://blog.roboflow.com/how-to-use-mobilenetv4-for-classification/

Discuss this Post

If you have any questions about this blog post, start a discussion on the Roboflow Forum.

Written by

Nathan Yan
Growth Engineer @Roboflow! Learning, Writing, Coding...