At their Worldwide Developer's Conference in 2019, Apple added object detection support to CreateML, their no-code machine learning app. This means, in theory, you can get a trained model suitable for use in your iPhone application without writing a single line of code.
And that's true... except for one thing. To train a machine learning model you need data. And to load data into CreateML you'll need it in the proper format. Unfortunately, Apple CreateML uses a proprietary JSON format that isn't widely supported in labeling tools. So most users will need to write some boilerplate code to convert their annotations.
To complete this tutorial you will need an Apple macOS device with CreateML installed (this comes free with XCode which is available in the Mac App Store).
You can collect data simply by taking pictures with your iPhone. Collect a variety of photos of your subject from different angles, with different backgrounds, and in different lighting. The more variation in the photos the better.
Once you've got a hundred or so images you're ready to label them. There are a variety of free tools available to do this. Take your pick from: CVAT, VoTT, LabelMe, or LabelImg and create bounding boxes around the objects you'd like to detect.
Then drop your images and annotations into Roboflow and export in CreateML format (optionally add some augmentations to increase the size of your dataset).
Now comes the fun part. We're ready to train a model! Fire up CreateML and create a new Object Detector project.
Then we'll unzip the download from Roboflow and point CreateML at the training, validation, and testing data we downloaded:
And we're ready to train. Just hit the "Train" button at the top and we're on our way! Training might take a while so grab a coffee (or let it run overnight depending on the size of your dataset and the number of iterations you picked) while it trains.
A word of warning: your computer is going to be working in overdrive so don't expect to be able to use it for other things while the model is training.
As your model trains, you'll watch the blue line tick to the right (and hopefully down). This line measures your validation loss which is one measure of how well your model is fitting your dataset. Lower numbers are better.
At the end of training you will see a percentage for Intersection Over Union (I/U) for each class and a Mean Average Precision for your Training, Validation, and Testing sets at the top. Higher values are better here.
From my screenshot above we can see that the model is better at detecting faces with masks than faces without masks. And that it does much better on the training and validation data than the testing data. This means the model did not generalize very well to data it hadn't seen before.
But numbers can be hard to grok. To truly see how your model performs, click on the "Output" tab and drop your test images in to see a visualization of your model prediction.
Here we can see that our model actually did quite well! It's detected two people with masks. Unfortunately, in some other images, it did worse; for example it missed the child with the mask and the lady whose face was partially obscured by the man's arm in this example:
This is a pretty good start! I can actually drop this CoreML file directly into the Apple sample app and run it on my iPhone's live camera feed.
But getting a model that is good enough to use in production requires experimentation and iteration. To improve this model I would go out and collect more varied training examples, add them to my dataset, and train another version of my model.
Then repeat the process of identifying where it's least accurate, adding more data, and training again until it's ready for prime time!
How did your model do? Did you find some augmentations that improved your model's performance? Tweet us, we'd love to hear!