Frankly Speaking, 12/3/19 - Spotting AI Snake Oil
A biweekly(-ish) newsletter on random thoughts in tech and research. I am an investor at Dell Technologies Capital and a recovering academic. I am interested in security, blockchain, and cloud.
If you were forwarded this newsletter, you can subscribe here, and view old newsletters here.
Hope everyone had a great Thanksgiving! It's been a busy month trying to close out the year, and it's been a crazy year. We had 7 exits this year (4 in security), and JASK's acquisition by Sumo Logic was the most recent one. We've had a good year so far!
LET'S BE FRANK
As a VC, you can imagine that I see tons of companies who claim to do AI/ML or some new kind of AI/ML. As irritating as that might be, it's sadder that many times, the company themselves don't know why they use certain types of AI/ML techniques. To put it simply, if they gave their explanations in an academic paper, that paper would be an easy reject.
Anyway, my friend, Kelly Shortridge, sent me a talk given by Arvind Narayanan, a professor at Princeton on how to recognize AI snake oil. Here are the annotated notes for the talk. I'm going to summarize most of the talk here and inject my commentary. He makes very good points. To add on, I believe in most cases, complicated AI/ML is unnecessary and simple techniques like clustering and linear regression are sufficient. The real issue is in the data collection and processing.
In the past, I've talked about various biases in data that make AI/ML have unintended consequences. Here is the on this topic, and for the next 2-3 weeks, I talk more about the biases.
tl;dr: AI/ML does well in some tasks, but cannot predict social outcomes. In most cases, manual scoring rules are just as accurate and more transparent.
Let's start with an example. Many companies claim to assess job suitability and personality from a 30 second video.
Let's be honest. We know this can't be possible. In fact, AI researchers have shown this isn't possible also. Honestly, the results are as good as those created with a random number generator. However, companies in this space have raised hundreds of millions of dollars just because they claim AI will solve all our problems. Call me a skeptic, but the PhD has taught me to recognize when technology is too good to be true.
So, why is there so much AI/ML snake oil? AI is an umbrella term that captures a lot of technologies, and some of those technologies have made substantial widely-publicized progress in the last 5 years. Because of this, companies are calling everything AI to exploit confusion. It reminds me a lot of the internet craze in the 1990s.
Techniques that have improved in the last 5 years are the following: face recognition, speech to text, content identification. We are pretty good at those.
Techniques for spam detection, content recommendation, and hate speech detection are very far from being reliable, but they are improving. However, the bias in these use cases is inevitable.
Finally, AI is not meant to be used for predicting criminal recidivism, job performance, policing, and at-risk kids to name a few. Here is a research paper detailing why. This was a massive study with 457 researchers that collected information about a child and a family. It basically showed that selecting 13,000 features was no better than focusing on 4 core ones. With only 4 core features, one can use 4-variable linear regression, which has been around forever in statistics. No complicated machine learning or deep learning required!
We need to be careful about the claims that many companies make about AI/ML, especially when it comes to trying to predict social outcomes. It can be harmful and brings up serious ethical concerns. We really need to be more rigorous in our analysis of AI/ML use in company products.