Frankly Speaking, 7/7/20 - Why AI/ML fails

A biweekly(-ish) newsletter on random thoughts in tech and research. I am an investor at Dell Technologies Capital and a recovering academic. I am interested in security, AI/ML, and cloud.

If you were forwarded this newsletter, you can subscribe here.

Hope everyone had a great July 4th and a restful weekend! I would like to give a shoutout to all the new subscribers. Welcome! Please forward this to anyone in your network that would enjoy it. More subscribers = more content.


A few weeks ago, I moderated a webinar panel with some of our portfolio founders. We talked about a variety of topics but focused on current and futures trends in cybersecurity. Specifically, we discussed how cloud and COVID have changed and will change the security landscape.

You can view a recording of it here. I know some of you don’t like to sign up and/or listen to recordings, so I will do a write up of some of the highlights in the next newsletter.

Finally, I’m really trying to up my social media game, especially on Twitter. If you subscribe to this newsletter and like the content, please follow me!

Follow me on Twitter


As many know, I have regularly expressed my frustration over tech's obsession with data. But, a lot of these discussions around AI/ML lack rigor. If companies and startups presented their work at an academic conference, it would immediately get rejected. Although it’s true that data represent the facts, what really matters is how those facts are interpreted!

I have written about this topic piecewise before, but I’m consolidating it into one post with a bit more explanation. So, why does AI/ML no work? Well… many reasons, but I am specifically looking at data biases. Most of this content is based on a paper by Harini Suresh, an MIT PhD student in machine learning, so if you want to learn more, please go read her paper!

tl;dr: Your AI/ML algorithm is as good as your data, but the data contains biases. You should account and adjust your algorithm and data accordingly for these biases.

There are five main sources of bias in AI/ML: historical bias, representation bias, measurement bias, aggregation bias, and evaluation bias.

Historical bias occurs if the data collection and measurement process is done perfectly. Wait what? Yes, there’s a simple problem: The past isn't representative of the future.

A good example is crime data. A perfectly sampled and measured crime dataset might show more crime in poorer neighborhoods, but this might also reflect historical factors. Another example is that about 5 percent of CEOs in the Fortune 500 companies are women. Should an image search reflect this "fact"? One of the main consequences of historical bias is that it tends to attribute negativity or reflect harm to a specific identity group. Sometimes, it's possible to provide context, but many times, the context is lost and/or complex.

Representation bias occurs because one group is under-represented in the input dataset. There are many causes for this. A couple of key reasons are:

  • Only a specific population is sampled.

  • The population on the evaluated model is distinct from the population that the model was trained on.

For example, if you train a model based on a city population, it might not perform as well on suburban populations because their behaviors are different.

Next, measurement bias. First, gathering good data is hard. I'm not denying that by any means. Many times, we use data as a proxy for a set of labels or features. For example, as a proxy for crime rates, we might use data on arrests. This issue that arises is most commonly known as information bias, which is more precisely known as differential measurement error. In short, this bias happens because proxies can be generated differently across groups. This can happen in a few ways.

  • The granularity of data varies across groups.

  • The quality of data varies across groups.

  • The defined classification task is an oversimplification. When you do supervised ML, you need to choose a label to predict. However, that label might not be representative of the task. For example, if you want to predict a student is successful, but the label might be his/her GPA.

Aggregation bias happens when an algorithmic model is generalized to groups with different conditional distributions. In other words, a model is too general, and there is a faulty assumption that the model, despite including data from all groups, applies to every group. For example, this bias is common with clinical-aid tools. For example, complications for diabetes patients vary across ethnicities and genders. Even though the training data might contain data from every group, it's unlikely a single model works well for every group.

Finally, there’s evaluation bias, which happens when the data used to benchmark or evaluate the machine learning model does not represent the target population. Training data is used to create models, but unrepresentative benchmarks can bias the model into working poorly for certain subsets of the population. For example, commercial facial recognition algorithms didn't work well on dark-skinned females.

So, you might ask what can I do about this? The first step is to understand the existence and provenance of bias so that we can properly account for them in the models. AI/ML is not magical in any way and has substantial limitations, and we still need humans to ensure we are properly creating and training models.

Some open questions:

  • What security applications are good use cases for AI/ML? What aren’t?

  • How should AI/ML infrastructure look like at a company? What is the DevOps equivalent for AI/ML?

  • Martin Casado and Matt Bornstein wrote about how the business of machine learning is different from traditional software. How do we account for this in our business models? How do we “program” these costs in?


Simpler is better and smarter. I prefer sentences that have meaning, not just words strung together.