Detect and Describe Flowers with Computer Vision and Generative AI

If you've ever been curious about the flowers you come across on your walks, in your garden, or in the wild, this application offers a fun and educational way to explore it further. By following the steps outlined in this blog post, you'll learn how to create your own flower detection and description application, allowing you to identify and learn about various flower species with just a snap of a photo.

In this blog post, you will learn how you can build an application using computer vision and Generative AI. The project we build in this guide combines the power of custom models created with Roboflow for flower detection and the ChatGPT API for providing detailed information about the detected flowers.

By following the instructions given here you’ll also gain insights into how you can apply similar techniques to create educational and informational applications for different objects of interest. Whether it's birds, animals, landmarks, or even everyday items, the principles behind this project can be extended to develop applications that provide valuable insights into the world around us.

How the Project Works

Our flower detection and description system works as follows:

Initialization: The JavaScript code initializes the application by accessing the user's webcam and loading a pre-trained model for flower detection from Roboflow.
Video Stream Processing: Once the webcam stream is set up, the system continuously captures frames from the video stream. Each frame is processed through the pre-trained model to detect flowers within the frame. Bounding boxes and labels are drawn around detected flowers.
User Interaction: When a flower is detected, the user has the option to click a button labeled "Show Flower Info" to request information about the detected flower.
Data Processing and ChatGPT Integration: Upon clicking the button, the system extracts the relevant information about the detected flower (such as its class name) and formulates a prompt to ask the ChatGPT API. The system sends the prompt to the ChatGPT API, which generates a response containing botanical information about the detected flower.
Displaying Information: The response from the ChatGPT API is displayed on the screen, providing educational details about the identified flower.

This architecture is illustrated in the following image:

Architecture of flower detection and description project.

Steps for building the project

To create this project, we need to:

Collect and label flowers dataset
Train an object detection model
Build application to detect flowers and generate information

Step #1: Collect and label flowers dataset

The images of Rose, Lily, Daisy and Sunflower were collected manually and uploaded to Roboflow for labelling.

Flower Dataset

Once the flower images are uploaded, they are labeled using bounding boxes for each flower class using Roboflow annotation tool.

Dataset Labelling

Step #2: Train an Object Detection model

After completing the labeling process, a dataset version is generated, and the model undergoes training using Roboflow's auto-training feature. The achieved training accuracy is 99.5%.

Model Metrics

The following graph shows how the model was trained.

Model Training

The model is automatically deployed to a cloud API. Roboflow provides various options for testing and deploying the model, including live testing in a web browser and deployment to edge devices. The accompanying image demonstrates the model undergoing testing via Roboflow's web interface.

Testing the model in Roboflow

Step #3: Build application to detect Flowers and generate information

This step involves constructing the application to detect the flower in a live camera feed. We will build a JavaScript App using roboflow.js library from this post. I have updated the code from hand-detector model from this post. Following is the code from main.js.

In this code you need to setup ChatGPT API Key in following variable:

var OPENAI_API_KEY = "OPENAI_KEY";

And the Roboflow Publishable API Key for roboflow.js:

var publishable_key = "ROBOFLOW_API_KEY";

You may also update the ChatGPT prompt in the following variable:

const text = "What is " + objectName + "? Give botanical information.";

Here is the updated source code from main.js:

$(function () {
    const video = $("video")[0];

    var model;
    var cameraMode = "environment"; // or "user"

    const startVideoStreamPromise = navigator.mediaDevices
        .getUserMedia({
            audio: false,
            video: {
                facingMode: cameraMode
            }
        })
        .then(function (stream) {
            return new Promise(function (resolve) {
                video.srcObject = stream;
                video.onloadeddata = function () {
                    video.play();
                    resolve();
                };
            });
        });

    var publishable_key = "ROBOFLOW_API_KEY";
    var toLoad = {
        model: " flowers-ujm4o",
        version: 2
    };

    const loadModelPromise = new Promise(function (resolve, reject) {
        roboflow
            .auth({
                publishable_key: publishable_key
            })
            .load(toLoad)
            .then(function (m) {
                model = m;
                resolve();
            });
    });

    Promise.all([startVideoStreamPromise, loadModelPromise]).then(function () {
        $("body").removeClass("loading");
        resizeCanvas();
        detectFrame();
    });

    var canvas, ctx;
    const font = "16px sans-serif";

    function videoDimensions(video) {
        // Ratio of the video's intrinsic dimensions
        var videoRatio = video.videoWidth / video.videoHeight;

        // The width and height of the video element
        var width = video.offsetWidth,
            height = video.offsetHeight;

        // The ratio of the element's width to its height
        var elementRatio = width / height;

        // If the video element is short and wide
        if (elementRatio > videoRatio) {
            width = height * videoRatio;
        } else {
            // It must be tall and thin, or exactly equal to the original ratio
            height = width / videoRatio;
        }

        return {
            width: width,
            height: height
        };
    }

    $(window).resize(function () {
        resizeCanvas();
    });

    const resizeCanvas = function () {
        $("canvas").remove();

        canvas = $("<canvas/>");

        ctx = canvas[0].getContext("2d");

        var dimensions = videoDimensions(video);

        console.log(
            video.videoWidth,
            video.videoHeight,
            video.offsetWidth,
            video.offsetHeight,
            dimensions
        );

        canvas[0].width = video.videoWidth;
        canvas[0].height = video.videoHeight;

        canvas.css({
            width: dimensions.width,
            height: dimensions.height,
            left: ($(window).width() - dimensions.width) / 2,
            top: ($(window).height() - dimensions.height) / 2
        });

        $("body").append(canvas);

        // Add button to display object information
        const button = $("<button/>")
            .attr("id", "btnobj")
            .text("Show Flower Info")
            .css({
                position: "absolute",
                top: "20px",
                //left: "20px"
            })
            .click(function () {
                const predictions = getCurrentPredictions();
                displayObjectInfo(predictions);
            });

        $("body").append(button);
    };

    const getCurrentPredictions = function () {
        return model ? model.detect(video) : Promise.resolve([]);
    };

    const displayObjectInfo = function (predictions) {
        predictions.then(async function (predictions) {
            if (predictions.length > 0) {
                // Select the object with the highest confidence score
                const object = predictions.reduce((prev, current) => (prev.score > current.score) ? prev : current);
    
                const objectName = object.class;
    
                const text = "What is " + objectName + "? Give scientific information.";
    
                // Remove previous text area if exists
                $("#objectInfo").remove();
    
                // Create a text area to display object information
                const textArea = $("<textarea/>")
                    .attr("id", "objectInfo")
                    .css({
                        position: "absolute",
                        width: "100%",
                        height: "100%",
                        backgroundColor: "rgba(0, 0, 0, 0.9)", 
                        color: "white", 
                        border: "2px solid white",
                        borderRadius: "5px",
                        resize: "none",
                        top: "80px",
                        padding: "10px", 
                        boxSizing: "border-box",
                        overflow: "auto"
                    });
    
                // Call GPT-3.5 chat completion API
                try {
                    const response = await fetch('https://api.openai.com/v1/chat/completions', {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json',
                            'Authorization': `Bearer ${OPENAI_API_KEY}`,
                        },
                        body: JSON.stringify({
                            model: 'gpt-3.5-turbo',
                            messages: [{ role: 'user', content: text }],
                            temperature: 1.0,
                            top_p: 0.7,
                            n: 1,
                            stream: false,
                            presence_penalty: 0,
                            frequency_penalty: 0,
                        }),
                    });
    
                    if (response.ok) {
                        const data = await response.json();
                        const completion = data.choices[0].message.content;
                        textArea.text(completion);
                        $("body").append(textArea);
                    } else {
                        console.error('Error: Unable to process your request.');
                    }
                } catch (error) {
                    console.error(error);
                    console.error('Error: Unable to process your request.');
                }
            } else {
                console.log("No object detected");
            }
        });
    };
 
    var prevTime;
    var pastFrameTimes = [];
    const detectFrame = function () {
        if (!model) return requestAnimationFrame(detectFrame);

        getCurrentPredictions().then(function (predictions) {
            requestAnimationFrame(detectFrame);
            renderPredictions(predictions);

            if (prevTime) {
                pastFrameTimes.push(Date.now() - prevTime);
                if (pastFrameTimes.length > 30) pastFrameTimes.shift();

                var total = 0;
                pastFrameTimes.forEach(function (t) {
                    total += t / 1000;
                });

                var fps = pastFrameTimes.length / total;
                $("#fps").text(Math.round(fps));
            }
            prevTime = Date.now();
        });
    };

    const renderPredictions = function (predictions) {
        var dimensions = videoDimensions(video);

        var scale = 1;

        ctx.clearRect(0, 0, ctx.canvas.width, ctx.canvas.height);

        predictions.forEach(function (prediction) {
            const x = prediction.bbox.x;
            const y = prediction.bbox.y;

            const width = prediction.bbox.width;
            const height = prediction.bbox.height;

            // Draw the bounding box.
            ctx.strokeStyle = prediction.color;
            ctx.lineWidth = 4;
            ctx.strokeRect(
                (x - width / 2) / scale,
                (y - height / 2) / scale,
                width / scale,
                height / scale
            );

            // Draw the label background.
            ctx.fillStyle = prediction.color;
            const textWidth = ctx.measureText(prediction.class).width;
            const textHeight = parseInt(font, 10); // base 10
            ctx.fillRect(
                (x - width / 2) / scale,
                (y - height / 2) / scale,
                textWidth + 8,
                textHeight + 4
            );
        });

        predictions.forEach(function (prediction) {
            const x = prediction.bbox.x;
            const y = prediction.bbox.y;

            const width = prediction.bbox.width;
            const height = prediction.bbox.height;

            // Draw the text last to ensure it's on top.
            ctx.font = font;
            ctx.textBaseline = "top";
            ctx.fillStyle = "#000000";
            ctx.fillText(
                prediction.class,
                (x - width / 2) / scale + 4,
                (y - height / 2) / scale + 1
            );
        });
    };
});

The above code sets up an application that uses a webcam to detect and identify flowers in real-time. It starts by accessing the webcam stream and loading a pre-trained model for flower detection from Roboflow. Once the model is loaded, it continuously analyzes each frame of the video stream to detect flowers, drawing bounding boxes and labels around them.

When a flower is detected, the user can click a button to ask for information about the flower. The code selects one flower object with highest confidence when there are multiple flowers detected. This triggers a request to the ChatGPT API, which generates a response containing botanical information about the detected flower. The response is then displayed on the screen, providing educational details about the identified flower. Here’s the final output of the application.

Conclusion

In this blog post, we've explored the creation of an application that detects flowers and retrieves information about it using computer vision and Generative AI. By using object detection and natural language processing, the application that we built, showcases the potential of combining these technologies to create interactive and educational applications.

Moreover, the same approach can be applied to detect and retrieve information about any object of interest, whether it's plants, animals, landmarks, or everyday items. This demonstrates the versatility and scalability of the technology, opening doors to a wide range of applications beyond flower identification.

With the ability to detect and provide information about various objects, this approach can be utilized in fields such as education, agriculture, retail, and more. By building upon the foundation laid out in this blog post, developers can create innovative applications that empower users to explore and learn about the world around them in exciting new ways.

All code for this project is available at GitHub. The dataset used for this project is available on Roboflow Universe.