Extract Nutrition Data from Food Labels with Computer Vision
Accurate nutrition data extraction from food labels is difficult due to the variability of label information, size of labels, and current vision model capabilities. Traditional OCR (Optical Character Recognition) tools struggle with the complexity and variability of food labels, but Vision Language Models (VLMs) like GPT-4o offer a powerful, context-aware solution.
In this blog, we’ll explore how to build an application that uses a vision-language model to extract nutrition data efficiently and accurately. We’ll use OpenAI to perform our task without any fine-tuning, demonstrating how out-of-the-box capabilities can deliver exceptional results.
Why vision language models (VLMs) for OCR?
Unlike traditional OCR systems that merely transcribe text, VLMs like GPT-4o combine text recognition with contextual understanding. This allows them to:
- Handle context specific abbreviations (e.g., “sat.” for saturated, “cholest.” for cholesterol).
- Predict missing information based on patterns (e.g., identifying a food type from its nutritional profile).
- Structure data intelligently for downstream applications.
Using a VLM enhances OCR tasks, especially for unstructured or semi-structured text, making it ideal for complex datasets like food labels.
Setting up a Workflow for food label OCR
To demonstrate how VLMs excel at extracting nutrition data, we’ll walk through setting up a workflow in Roboflow. This step-by-step guide ensures you can replicate the process for your own datasets.
1. Start with Roboflow Workflows
Sign in to Roboflow and create a new workflow. Add the OpenAI block to utilize GPT-4o’s vision capabilities.
2. Set up your OpenAI API key
To use GPT-4o, you’ll need access to the OpenAI API. Here’s how to set it up:
- Visit the OpenAI API website and create an account if you don’t already have one.
- Navigate to the API Keys section in your OpenAI account settings and generate a new API key.
- Copy the API key and paste it into the OpenAI block setup in Roboflow Workflows.
This integration allows Roboflow to interact with GPT-4o, enabling its vision capabilities for your task.
3. Choose the task type
Roboflow offers multiple task types for OCR. For this use case, select the ‘open prompt’ task type instead of plain OCR. This enables GPT-4o to go beyond extracting raw text, leveraging its context-aware capabilities to deliver structured and meaningful outputs.
4. Prepare your prompt
Crafting the right prompt is key to guiding GPT-4o’s behavior. Here’s an example:
Extract text from this nutrition facts label using OCR. If the food name is missing, predict it based on the nutritional values. Make sure to use the correct unit value. Replace any missing values with 0g.
Output the result as a raw JSON string as follows:
food name: [food name], serving size: [value], calories: [value], added sugars: [value], biotin: [value], calcium: [value], chloride: [value], choline: [value], cholesterol: [value], chromium: [value], copper: [value], dietary fiber: [value], fat: [value], folate/folic acid: [value], iodine: [value], iron: [value], magnesium: [value], manganese: [value], molybdenum: [value], niacin: [value], pantothenic acid: [value], phosphorus: [value], potassium: [value], protein: [value], riboflavin: [value], saturated fat: [value], selenium: [value], sodium: [value], thiamin: [value], total carbohydrate: [value], vitamin A: [value], vitamin B6: [value], vitamin B12: [value], vitamin C: [value], vitamin D: [value], vitamin E: [value], vitamin K: [value], zinc: [value]
The list of nutrient fields in this prompt is obtained from the FDA’s website, where daily recommended values are defined. By using this standardized list, we ensure that all nutrition labels—regardless of what they include or omit—are processed in a consistent format.
Most food labels don’t list every nutrient, which could create inconsistencies when aggregating data. By prompting GPT-4o to replace missing values with 0g, we can generate structured JSON outputs that include all fields, even if certain nutrients are not explicitly mentioned on the label. This approach makes the data uniform and ready for downstream use, such as meal tracking or analytics.
5. Add the JSON Parser Block
Although the prompt already instructs GPT-4o to output the results as JSON, adding a JSON parser block in Roboflow Workflows can make the output even more structured and neatly formatted, ensuring it’s ready for downstream applications like database integration or analytics.
To configure the JSON parser block:
- Add the block to your workflow immediately after the OpenAI block.
- In the “Expected Fields” parameter, list all the nutrient names obtained from the FDA, separated by comma.
food name, serving size, calories, added sugars, biotin, calcium, chloride, choline, cholesterol, chromium, copper, dietary fiber, fat, folate/folic acid, iodine, iron, magnesium, manganese, molybdenum, niacin, pantothenic acid, phosphorus, potassium, protein, riboflavin, saturated fat, selenium, sodium, thiamin, total carbohydrate, vitamin A, vitamin B6, vitamin B12, vitamin C, vitamin D, vitamin E, vitamin K, zinc
Testing the Workflow
Let’s apply this workflow to a nutrition label.
Example 1: Single nutrition label
We use a food label missing the food name.
Output:
"json_parser": {
"food name": "Protein Supplement",
"serving size": "39.6g",
"calories": "160",
"added sugars": "0g",
"biotin": "0g",
"calcium": "180mg",
"chloride": "0g",
"choline": "0g",
"cholesterol": "95mg",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "0g",
"fat": "2.5g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "0.3mg",
"magnesium": "0g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "0g",
"potassium": "170mg",
"protein": "25g",
"riboflavin": "0g",
"saturated fat": "1.5g",
"selenium": "0g",
"sodium": "110mg",
"thiamin": "0g",
"total carbohydrate": "9g",
"vitamin A": "0g",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "0g",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
}
GPT-4o inferred the food type based on its high protein-to-serving size ratio, showcasing its ability to predict missing information intelligently.
Example 2: Processing multiple labels
We tested the workflow on five labels simultaneously: protein powder, milk, banana, Greek yogurt, and instant oats. None of the labels explicitly mentioned the food name.
Results:
[
{
"output": {
"food name": "instant oats",
"serving size": "40g",
"calories": "150",
"added sugars": "0g",
"biotin": "0g",
"calcium": "20g",
"chloride": "0g",
"choline": "0g",
"cholesterol": "0g",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "4g",
"fat": "3g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "1.5g",
"magnesium": "40g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "130g",
"potassium": "150mg",
"protein": "5g",
"riboflavin": "0g",
"saturated fat": "0.5g",
"selenium": "0g",
"sodium": "0g",
"thiamin": "0.2g",
"total carbohydrate": "27g",
"vitamin A": "0g",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "0g",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
},
"json_parser_1": {
"food name": "Greek Yogurt",
"serving size": "170g",
"calories": "100",
"added sugars": "0g",
"biotin": "0g",
"calcium": "190mg",
"chloride": "0g",
"choline": "0g",
"cholesterol": "10mg",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "0g",
"fat": "0g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "0g",
"magnesium": "0g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "0g",
"potassium": "180mg",
"protein": "18g",
"riboflavin": "0g",
"saturated fat": "0g",
"selenium": "0g",
"sodium": "60mg",
"thiamin": "0g",
"total carbohydrate": "7g",
"vitamin A": "0g",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "0g",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
},
"json_parser_2": {
"food name": "banana",
"serving size": "136g",
"calories": "120",
"added sugars": "0g",
"biotin": "0g",
"calcium": "7mg",
"chloride": "0g",
"choline": "0g",
"cholesterol": "0g",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "4g",
"fat": "0g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "0mg",
"magnesium": "0g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "0g",
"potassium": "487mg",
"protein": "1g",
"riboflavin": "0g",
"saturated fat": "0g",
"selenium": "0g",
"sodium": "0mg",
"thiamin": "0g",
"total carbohydrate": "31g",
"vitamin A": "0g",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "0g",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
},
"json_parser_3": {
"food name": "2 percent milk",
"serving size": "8 fl oz (240 mL)",
"calories": "120",
"added sugars": "0g",
"biotin": "0g",
"calcium": "290mg",
"chloride": "0g",
"choline": "0g",
"cholesterol": "20mg",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "0g",
"fat": "5g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "0mg",
"magnesium": "0g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "0g",
"potassium": "380mg",
"protein": "8g",
"riboflavin": "0g",
"saturated fat": "3g",
"selenium": "0g",
"sodium": "105mg",
"thiamin": "0g",
"total carbohydrate": "12g",
"vitamin A": "150mcg",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "2.5mcg",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
},
"json_parser_4": {
"food name": "Protein powder supplement",
"serving size": "39.6g",
"calories": "160",
"added sugars": "0g",
"biotin": "0g",
"calcium": "180mg",
"chloride": "0g",
"choline": "0g",
"cholesterol": "95mg",
"chromium": "0g",
"copper": "0g",
"dietary fiber": "0g",
"fat": "2.5g",
"folate/folic acid": "0g",
"iodine": "0g",
"iron": "0.3mg",
"magnesium": "0g",
"manganese": "0g",
"molybdenum": "0g",
"niacin": "0g",
"pantothenic acid": "0g",
"phosphorus": "0g",
"potassium": "170mg",
"protein": "25g",
"riboflavin": "0g",
"saturated fat": "1.5g",
"selenium": "0g",
"sodium": "110mg",
"thiamin": "0g",
"total carbohydrate": "9g",
"vitamin A": "0g",
"vitamin B6": "0g",
"vitamin B12": "0g",
"vitamin C": "0g",
"vitamin D": "0g",
"vitamin E": "0g",
"vitamin K": "0g",
"zinc": "0g",
"error_status": false
}
}
]
All the foods are correctly identified, with all nutrition data accurately parsed. What’s incredible is that our workflow is also able to identify if our milk is a 2-percent milk based on its fat content.
Practical applications
This workflow is useful for developers working on:
- Personalized Diet Apps: Automate nutrition tracking and meal recommendations.
- Grocery Management Systems: Parse product data for inventory or labeling purposes.
- Health Research: Quickly digitize nutrition data for analysis.
By leveraging VLMs like GPT-4o, developers can build smarter, more efficient solutions tailored to real-world needs.
Try it yourself
Vision Language Models like GPT-4o are transforming how we extract and structure data from complex, unstructured sources like food labels. Alongside GPT-4o, Roboflow Workflows also supports Microsoft Florence-2, Anthropic Claude, and Google Gemini, offering flexibility and fine-tuning options for more specialized use cases.
For this tutorial, we’ve focused on the OpenAI block to showcase its powerful capabilities in handling context-aware OCR. Whether you’re building personalized diet apps, grocery management systems, or health research tools, VLM integration can simplify your workflow and accelerate your development process.
Ready to build your computer vision project?
- Try Roboflow for free and explore our tutorials to get started.
- Join our community to share experiences, ask questions, and learn from others.