There are two different ways to think about algorithmic bias, and they are complementary to one another. The first being the social and ethical side, and second being the more technical side, how we detect it, and mitigate it.

Today we’re going to dive into the technical side of avoiding bias in computer vision models, to give you an introduction into the topic. If you are interested in learning more about the social and ethical side, The Power of Representation in Computer Vision serves as a good introduction.

It’s important to note that bias in Machine Learning isn't only an ethics or technical question, but also a business question, as model errors are lost business. Imagine a model that rejects loan applicants incorrectly. Regardless if that reason was ethical or technical, the business incurred a loss because of it.

Carnegie Mellon University guest speaker on Preventing Algorithmic Bias

What is Bias in Machine Learning?

To put it simply, bias is the average error of a machine learning model after training on a training dataset. When we’re thinking about bias, we’re always thinking about residuals, how far off from the truth is the model. In other words, bias is basically a model’s ability to capture the patterns in a dataset in an accurate and generalized way.

Another measurement we need to look at when speaking about bias is variance. Variance speaks to how much the output of a model changes when presented with new data. One way to observe variance is to analyze test error. High variance can be observed when our training error is close to zero and our test error is high, also known as overfitting.

In an ideal model, both the train error and test error are consistently low, even across different training runs.

So, how can we mitigate bias in our models?

Data First Mentality

Collect Large and Representative Data

Training a model with a small dataset will result in high variance, making overfitting hard to avoid because of the limited amount of observations and a large number of predictors. A large enough data set with representative data will help the model generalize effectively. Representative data means that the training data has similar characteristics to data collected when the model is deployed. For example, if we are detecting dogs and our model will be deployed to detect them both indoors at a clinic and outdoors at a dog park, we would want to have data representing both environments, including all the varieties of dogs found in each environment.

Improving Your Data With Active Learning

Active learning accelerates the rate a model improves its performance by taking a series of intentional steps for passing data into the model. Let’s say we don’t have enough images of corgis in a dog breeds dataset, and because of that, our model does not perform well when presented with corgis. With active learning we prioritize getting corgi data into our training set, retraining, and redeploying. By doing this the model is learning from data that was most likely to fool it.

Models are optimized based on the variables and parameters in the time they were created. As the world around us changes, the model as it was originally created, will be in new environments. With active learning, we can expose it to new things, retrain, and redeploy it.

Roboflow has an Upload API where you can programmatically collect and send real world images directly from your application to improve your model.

Perform Model Error Analysis

On Roboflow, when you are finished training your model, you are provided with the training results, as well as the ability to view more details and generate visualizations. With visualizations, you have the option to see which images in the validation and testing datasets the model performed poorly on compared to the ground truth.

This can help to validate proposed labels and to understand if you need additional or different types of data, like in different environments, in order to retrain it. This process can also be used to find things like missing or mislabelled data.

0:00
/
Comparing model predictions to the ground truth in testing set

Health Check

The health check feature displays measurements of class balance or imbalance. To continue with our dog example, this could let us know if we have too few corgis and that we need to do some active learning to improve our performance on corgi data. The number of null images are also helpful as they help our model recognize when the objects we are looking for are not present.

Be Deliberate with Duplicate Images

Having duplicate images in a data set introduces bias because it gives a model opportunities to learn patterns specific to the duplicates. In other words, duplicate images get a disproportionate amount of training time. Additionally, we need to make sure duplicates do not enter different train, validation, test splits, since their presence will also bias the evaluation metrics. Thankfully, Roboflow automatically removes duplicates during the upload process, so if you’re using Roboflow to upload images you won’t need to worry about this part of the mitigation.

Discuss & Share

You can discuss or ask questions about improving your model and how to make it less biased at discuss.roboflow.com. We also highly encourage our readers to add their datasets and models to Roboflow Universe. (We are desperately in need of corgis.)