Frankly Speaking, 6/18/19 -- Deep thoughts on deep learnings
A weekly(-ish) newsletter on random thoughts in tech and research. I am an investor at Dell Technologies Capital and a recovering academic. I am interested in security, blockchain, and devops.
If you were forwarded this newsletter, you can subscribe here, and view old newsletters here.
Sorry, there was no newsletter last week because I was at my MIT PhD doctoral hooding ceremony. I graduated a year ago, but I missed the deadline to walk because thesis writing doesn't always align with graduation deadlines. Anyway, for those of you who are curious, this is my thesis, and my website has more about the work I've done.
Anyway, enough about me. I want to give a shoutout to Dell Tech Capital portfolio company, Barefoot Networks*, for being acquired by Intel. This is DTC's 4th exit this year (Cloudendure, Cylance, Twistlock), and we are only halfway through the year! I'm proud and extremely fortunate to be working with such an awesome team.
WEEKLY TECH THOUGHT
This week, I interviewed Davis Blalock, a machine learning PhD student, under John Guttag. He talks about his thoughts on machine learning research and industry.
1) Biggest AI breakthrough in the last three years in industry:
I don’t think there’s anything I would point to as a breakthrough—just a lot of steady improvement on many fronts. The most surprising results for me have probably been WaveNet and CycleGAN, since the results they generate are so realistic. Also, it’s just outside of three years, but ResNet has had a lot of impact on how people think about designing deep models.
2) Problems interested in solving and why:
I’m interested in making deep learning more efficient and more conducive to user privacy.
Deep learning can get you great results, but it often requires enormous datasets and huge amounts of computational resources. If you look at the data most companies have—not most huge internet companies that get all the hype, but most companies in general—they just don’t have that much data. And if you look at where people want to deploy models, a lot of the time it’s low-end smartphones or IoT devices. This limits the applicability of deep learning a great deal.
And even worse, deep learning's inefficiency creates strong incentives to violate people’s privacy and leave data vulnerable. When you need every last bit of data you can get, and you need it in cleartext so you can train on it, and you need it in one place so you can use your specialized hardware, you have a perfect storm for privacy violations and data breaches.
3) How these problems will be applicable in the future:
Industry use of machine learning is still just starting. And for as long as people are training models, there will be a push to train models faster, with less data, and with greater privacy.
4) What big applications will be enabled by a solution to this:
If you could make deep learning extremely fast and extremely data-efficient, you would enable three things:
-First, you would expand the set of applications where deep learning is useful, since there’s a long tail of tasks with limited data. This is particularly true in healthcare, since getting labeled data is hard and there simply aren’t many people with any given combination of age, medications, genetics, comorbidities, etc.
-Second, you would dramatically speed up progress both in machine learning research and for a given task, since you could try out different models more quickly.
-And third, you could enable much better preservation of privacy. Companies would see stronger diminishing returns with increased data collection, and could afford to use strategies like Federated Learning or even secure multiparty computation to train on data without centralizing it all in cleartext.
5) Interesting companies in the space and why:
One interesting company to me is Oasis. There are a lot of people trying to enable either private machine learning or data sharing across organizations, and they have the most compelling story for both. That said, they have some fundamental cybersecurity challenges to overcome, in addition to the uncertainty associated with trying to create (or at least radically expand) a new market.
Two other interesting companies are LightMatter and Lightelligence. Both are MIT spinouts that are building deep learning ASICs based on photonics. If they succeed, they’ll completely reinvent how deep learning hardware is done. Plus, given how many patents they have, their success and acquisition could result in one or two companies having a decisive hardware advantage for a number of years.
Finally, I think NVIDIA is in an interesting position. Companies like GraphCore* are already matching or surpassing NVIDIA’s hardware for deep learning, and projects like TVM are slowly eroding the need for its developer tools (e.g., CUDA). The latter in particular have been instrumental in making NVIDIA GPUs dominant for AI, so it might be possible for AMD and other hardware manufacturers to gain ground in the next few years if they play their cards right.
LET'S BE FRANK
A lot of times, we see security companies who are successful in different geographic locations. Some security companies are wildly successful in Europe while others do really well in the US. I've noticed some general cultural trends toward security and privacy.
Every consumer gives his/her data to a corporation in some form. The biggest difference is who the consumer trusts more. In Europe, it seems like that the consumer trusts the government to enact and enforce privacy policies on private corporations. As a result, there are regulations, such as GDPR, and they are actively enforced. In the US, consumers don't seem to trust corporations or the government to enforce privacy policies. This seems to be a recent trend with Facebook inappropriately using users' data, but before that, consumers gave their data to corporations in exchange for using the service. However, I believe that 1. users didn't know how invasive and powerful this data was, and 2. that corporations were selling this data. At least, for a while, users trusted corporations to self-regulate or respect their privacy. Therefore, recently, there have been more state laws, like in California and New York, that try to protect user's privacy. The trust has shifted from the corporations to the government.
The bigger question is always whose responsibility is it to protect user data. Corporations or government? Maybe non-profits? Services like DuckDuckGo and Tor have become more popular, and they aren't trying to compete against Google. I think we definitely need more optionality so that users, who care about their privacy, have a choice to another service that doesn't collect their data. Right now, we don't really have that choice. It'll be interesting to see how various industries that have made their profits from collecting user data evolve with further regulations as well as the more privacy-conscious consumer.
Maybe Facebook's cryptocurrency is a way it's changing its business model... My thoughts on blockchain is a different post!