Using Computer Vision to Understand Food and Cuisines

This article was contributed to the Roboflow blog by Abirami Vina

Introduction

Exploring global cuisines is a great way to learn about different cultures. Each dish and recipe tells a story of traditions, history, and the unique flavors of its region

Image classification, a core task type in computer vision, can be applied to help understand different foods and cuisines. The results from an image classification model can be used in applications that help people identify the cuisine in a given food.

An example of computer vision being used to classify food. Source.

In this article, we'll discuss the importance of understanding and exploring different cuisines and see how image classification can be used through a simple Streamlit application to identify and provide information about different dishes. Let’s get started!

The Significance of Culinary Diversity

Cuisines reflect a community's values, history, and social dynamics. It tells the story of our ancestors' practices and memories, and modern mixtures of different cuisines show how cultures are connected.

Culinary diversity is important because it encourages global unity and economic growth. It celebrates traditions and supports local economies through tourism and agriculture. Food also strengthens community bonds and inspires kitchen creativity. 

Technological advancements, like computer vision, can help bridge cultural gaps by offering immersive experiences and fostering global understanding and collaboration.

Using Computer Vision to Enhance Culinary Exploration

In this guide, we are going to create an application that uses image classification to identify cuisines. We will use Roboflow Universe to find a model, then Streamlit to use our model in an application.

The application we build will assist users in identifying the dish shown in their images, enhancing their understanding of various cuisines.

In this guide, we’ll focus on how to apply an image classification model rather than how to train your own model. For more information on creating your own image classification model, take a look at our guide on image classification.

A Trained Image Classification Model

We'll be using a trained food image classification model from Roboflow Universe. Roboflow Universe is a computer vision platform with open-source datasets and models. Universe boasts a vast collection, offering over 200,000 datasets and 50,000 models ready for use. To begin using these resources, set up a Roboflow account and head to the model page we are interested in, as shown below.

When you scroll down on the page, you'll find a code snippet demonstrating how to use the API with this model. It's important to take note of the model ID and version number, which you can find in the code.

For instance, in our example, the model is identified as "food-classify-wwzd3" and is in its first version. Remembering these details is important for when we put together our script for making inferences.

Developing the Application

Let’s walk through the different steps in developing an application to help explore global cuisines. 

Step 1: Setting Up the Requirements

Let’s start by installing the needed dependencies. Run the following command:

pip install streamlit pandas Pillow roboflow

Step 2: Understanding the Classes

Next, navigate to the overview page of the trained model. This page will list the different classes that the model can classify, as shown below.

The classes represent the names of various Vietnamese and Korean food items. We can obtain more information about them from ChatGPT, which we can then display in our Streamlit application after the image classification model performs an inference on an image. 

For example, the following data is from ChatGPT:

Bap Bo Bam known as cooked rice in English is a staple food in Korean cuisine and is often served as a side dish or as a base for other dishes Rice holds a central place in Korean cuisine and is a staple food in Korean households. It serves as the foundation for many meals and is often accompanied by a variety of side dishes. Additionally, rice is used in the preparation of various dishes such as bibimbap (a mixed rice dish with vegetables, meat, and often a fried egg on top) and kimbap (Korean rice rolls with vegetables and sometimes meat). 

Our aim is to be able to upload an image of food like Bap Bo Bam to our Streamlit application, and then display the name 'Bap Bo Bam' along with information about it so that users can learn more about Vietnamese and Korean cuisine.

Step 3: Building the Streamlit Application

First, we’ll import the required packages as shown below.

import streamlit as st
import pandas as pd
from PIL import Image
from roboflow import Roboflow

Next, we need to initialize the image classification model we are going to use. Remember to replace ROBOFLOW_API_KEY with your Roboflow API key. Refer to the Roboflow documentation for instructions on how to get your API key.

#Replace ROBOFLOW_API_KEY with your Roboflow API Key

rf = Roboflow(api_key="ROBOFLOW_API_KEY")
project = rf.workspace().project("food-classify-wwzd3")
model = project.version(1).model

We will need a reference database that we can use to map the results of our model (i.e. a clasification that a given dish is Vietnamese) to more information. We can show this information to the user.

Download an Excel sheet with the data related to the food classes here, and create a data dictionary using the following code. We can use this dictionary to look up information about different cuisines identified by our model.

# Read data from the Excel file
excel_file_path = "path_to_food.xlsx"
df = pd.read_excel(excel_file_path)
class_mapping = dict(zip(df['Class'], zip(df['English Name'], df['Origin'])))

Finally, we can set up the core functionality of our Streamlit web application. It starts by defining the app's title. The application allows users to upload images of food items in common formats like JPG, JPEG, or PNG.

Once an image is uploaded, it is displayed on the app for the user to view. The key feature is the 'Run Inference' button, which, when clicked, triggers the image classification process. This process involves sending the image to a Roboflow model, which then returns predictions about the food item in the image.

The app subsequently displays these results, including the class (type of cuisine), and if available, additional information like the English name and origin of the dish.

# Streamlit app code
st.title("Using Computer Vision to Understand Food and Cuisines")

# Upload image through Streamlit
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

# Display the uploaded image
if uploaded_file is not None:
    st.image(uploaded_file, caption="Uploaded Image.", use_column_width=True)

    # Run Roboflow inference when the user clicks a button
    if st.button("Run Inference"):
        # Convert the uploaded file to a format that Roboflow can use
        image = Image.open(uploaded_file)
        image.save("uploaded_image.jpg")

        # Perform inference using Roboflow
        prediction = model.predict("uploaded_image.jpg").json()

        # Display the results
        if "predictions" in prediction and prediction["predictions"]:

            # Get the predicted class
            predicted_class = prediction["predictions"][0]["predictions"][0]["class"]
             
            print(predicted_class)

            if predicted_class is not None:
                # Check if the predicted class is in the mapping
                if predicted_class in class_mapping:
                    english_name, origin = class_mapping[predicted_class]
                    st.write(f"You've uploaded an image of {predicted_class.capitalize()}:")
                    st.write(f"English Name: {english_name}")
                    st.write(f"Origin: {origin}")
                else:
                    st.warning("No information found for the predicted class.")
            else:
                st.warning("No class information found in the prediction.")

            # Optionally, check if 'annotations' key is present
            if "annotations" in prediction and "image" in prediction["annotations"]:
                annotated_image_url = prediction["annotations"]["image"]

                # Display annotated image
                st.image(annotated_image_url, caption="Annotated Image.", use_column_width=True)
            
        else:
            st.warning("No predictions found.")

Step 4: Running the Application

Now we have an application ready, we can start using it. Use the following command to run your Streamlit application:

streamlit run your_python_filename.py

Your Streamlit application will start to run. A message will appear in your terminal informing you of how you can access your application.

You can view your Streamlit application at http://localhost:8501/:

You can then upload an image to test your application, as shown below.

Then, click the “Run Inference” button, and your results will be displayed below.

The results show that an image of Mi xao was uploaded, otherwise known as stir-fried noodles in English, and gives information about the origin of stir-fried noodles. Congratulations, you’ve just made a computer vision-enabled Streamlit application to learn about different foods and cuisines! 

Conclusion

By using computer vision to help build tools that people can use to explore culinary arts, we've opened up a new dimension of cultural appreciation and education. The tool we’ve created identifies various dishes and provides a gateway to their histories and cultural significance, fostering a deeper global connection.

Using computer vision in our food experiences opens up exciting possibilities. It could change how we cook at home and help out in professional kitchens. It can be used to check food quality and analyze what's in our meals.

As AI keeps improving, we'll be able to understand and appreciate foods from around the world in more detail, helping us connect with different cultures through their cuisines. This project is just the tip of the iceberg when it comes to food and computer vision.