Many of you may have been following Roboflow as student learning from some of our computer vision tutorials and model library. At any rate, if you have been using Roboflow, you have been training a custom computer vision model of some sort. To do so, you gathered a custom dataset, labeled it, and trained a popular architecture to model the task.
With the introduction of Roboflow public, we have gathered a public repository of thousands of computer vision datasets, and growing. This dataset has immense potential to inform the research community of the generalizability of their models.
We are seeking an intern who would be interested in first-authoring the paper that introduces the Roboflow Universe Datasets.
We envision that the first paper on these datasets would follow something like the following outline:
1 - Introduction
We introduce the state of computer vision research and comment on how there are a lot of single dataset benchmarks, but no benchmark that tests generalizability. We drive home the need for a dataset across domains. We talk a bit about what Roboflow Universe is and how the dataset was inspired.
2 - Related Work
We write about Pascal VOC, ImageNet, COCO, etc. and what they were trying to achieve and how ML in practice has deviated from these single large network that can generalize to anything approaches.
3 - Dataset Collection and Distribution
We expose the Roboflow Universe dataset that has already been labeled and curated by thousands of Roboflow users.
We release this dataset as a benchmark as well as in packets - i.e. "Industrial Datasets", "Satellite Datasets".
Perhaps we export datasets as a result of dataset search, upcoming on Roboflow's roadmap.
4 - Modeling Experiments
We benchmark popular model architectures on our suite of Roboflow Universe datasets in a grid. Most of the coding for this internship we will be involved here in automating the calculation of this benchmark. You will not be GPU constrained - that is you will have as many GPUs as you need.
5 - Results
In the paper we can include the compressed sections in the main body, and the more granular views in the appendix.
6 - Discussion
We discuss the results and perhaps even theorize about model architectures. Then we make a call to action for the community to start using this benchmark and creating new derivative benchmarks.
7 - Conclusion
We review our dataset contribution and provide a link to the dataset's website and github.
Email if Interested
We will shoot for an arxiv publication at the minimum and a long paper journal submission to a top publication at the maximum.
This is a paid internship.
If this internship interests you, please write us a short note about why you are interested and drop your application here.
NOTE: This position has been filled.
As always, happy training!