How to Build a Custom Open Images Dataset for Object Detection
We are excited to announce integration with the Open Images Dataset and the release of two new public datasets encapsulating subdomains of the Open Images Dataset: Vehicles Object Detection and Shellfish Object Detection.
In this post, we will walk through how to make your own custom Open Images dataset.
Vehicles and Shellfish are just a small window into the vast landscape of the Open Images dataset and are meant to provide small examples of datasets that you could construct with Open Images.
About Open Images
Open Images is an open source computer vision object detection dataset released by Google under a CC BY 4.0 License. The dataset contains a vast amount of data spanning image classification, object detection, and visual relationship detection across millions of images and bounding box annotations. The Open Image dataset provides a widespread and large scale ground truth for computer vision research.
Why Create A Custom Open Images Dataset?
The uses for creating a custom Open Images dataset are many:
- Experiment with creating a custom object detector
- Assess feasibility of detecting similar objects before collecting and labeling your own data
- Augmenting an existing training set
- Training a custom detector model checkpoint to apply to a more niche custom task where you have less data
- And of course, for fun 😁
Remember this is all free, labeled computer vision data that lives in the creative commons.
The Open Images Query Tool
The whole Open Image Dataset is halfway to a terabyte... and to download it raw, you will be running some commands such as:
Luckily, the open source community has created tools that make querying the Open Images database easy to use. In order to construct our custom Open Images datasets, we used the OIDv4_ToolKit. The OIDv4_ToolKit makes it easy for you to query subdomains of the OID and limit to specific classes. Simply with one line of python, you can specify the class and number of images you want. And it comes down with bounding boxes and everything!
Converting Open Images Annotation Formats
We are excited to announce that we now support Open Images data formats at Roboflow. When you download the Open Images data, you will receive a large intractable CSV file containing all of the annotations in the entire dataset along with a class map. You will also recieve .txt
files for annotations for each image that are much more tractable. We support both of these formats but I recommend using the .txt
files.
In order to convert your annotations into any format, you simply make a free account with Roboflow and drag your images into the data upload flow.
Once your dataset is created, you will be able to export in any format you desire. To name a few you will be able to:
- Convert Open Images to Coco Json
- Convert Open Images to Pascal Voc XML
- Convert Open Images to Create ML Json
- Convert Open Images to YOLO Darknet
- Convert Open Images to Amazon Sage Maker
- Convert Open Images to Amazon Rekognition
- Convert Open Images to TFRecord
- Convert Open Images to YOLOv5 Pytorch
Then you can train your custom detector with whichever model you like! At the time of writing this I am mostly training YOLOv5 detectors.
You can also merge your new custom dataset with another one of your datasets to increase coverage.
Introducing Roboflow's Public Custom Open Images Datasets
We have created two public custom Open Images datasets and shared among our public datasets: Vehicles Object Detection and Shellfish Object Detection.
The have been shared for public use on our public computer vision datasets.
Conclusion
Now you know how to construct a custom Open Images dataset using completely free computer vision data and open source tools.
We look forward to seeing what you build with Open Images! 🚀
If you are interested in scaling up these datasets or working on creating your own, please drop us a line!