My thoughts on DeepSeek and what it means for security
It's time for security to innovate faster
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.
We’re hiring for several roles at Headway in our Trust organization, led by the fearless Susan Chiang. Specifically, we’re hiring a software engineering manager, a product manager, and a product security engineer.
All these roles will work closely with me! If you’re excited to help build a new mental healthcare system that everyone can access, please apply and/or reach out to me!
I spent a good part of my Monday talking reading articles and talking with my friends who are AI/ML experts about DeepSeek. There’s a lot of content on the internet. Some of them are great explanations of what’s going on, but with all things like this, there are a lot of opinions about what this means for the future of AI and technology. Of course, I don’t agree with all the opinions, but the frustrating part is that many of them, especially ones related to security, are based on incorrect understandings of the technical details of DeepSeek.
I don’t claim to be an AI/ML expert by any means. I usually learn by talking to experts, who know way more than I do, or by reading articles and papers written by those experts.
Anyway, this newsletter likely won’t be as coherent as my previous newsletters (not that those are particularly coherent). It’ll be a mixture of my initial thoughts based on what I’ve read and heard.
What is DeepSeek?
It seems that many people have already read about this, so I won’t dive too much into detail. DeepSeek is a Chinese AI startup that was able to create a chatbot with similar performance to OpenAI o1 but with a fraction of the cost. My basic understanding of the innovation is that they can do this with pure reinforcement learning (RL) rather than requiring human feedback to teach the RL. In other words, this model can learn itself without requiring humans to train it. All it needs is compute and data!
Based on what I heard, none of these techniques are new. In fact, many of these were used in a similar AI-related program called AlphaGo. What is impressive is that they achieved similar performance but lower costs for the same types of reasoning problems as OpenAI’s o1.
If you want to learn more about DeepSeek, I recommend this Stratechery article by Ben Thompson. It captures most of the technical nuances but is highly accessible, especially for those who don’t have deep AI/ML knowledge like me.
Why are people worried?
Not to be too philosophical, people are generally worried about the unknown and the new. What’s most shocking is that this innovation happened despite Biden’s AI chip ban. Initially, this was intended to slow down innovation in China, but this result seems not to have that effect. This spurs further worries that it seems that even with better access to technology, the US might fall behind in the future.
Regardless, we shouldn’t be worried or find reasons to discredit DeepSeek. We should take this as a learning experience. What are some early takeaways? Noah Smith, who writes the Noahpinion Substack, wrote an interesting article discussing key learnings. It’s definitely worth a read, but here are the highlights:
LLMs don’t have very much of a “moat” — a lot of people are going to be able to make very good AI of this type, no matter what anyone does.
The idea that America can legislate “AI safety” by slowing down progress in the field is now doomed.
Competing with China by denying them the intangible parts of LLMs — algorithmic secrets and model weights — is not going to work.
Export controls actually are effective, but China will try to use the hype over DeepSeek to give Trump the political cover to cancel export controls.
What this also means is that AI will be cheaper and more accessible. This will likely increase demand overall. This is similar to Uber essentially expanding the taxi market by making ridesharing cheaper and more accessible. My friend Andrew Chou, who does AI/ML at Amplitude, expands on this.
What does this mean for security?
The inevitable is happening, and it’s happening now! With cheap AI, companies are going to use it more. When AI usage increased initially with OpenAI, it felt like security leaders were trying to convince business leaders that since the cost was high and there were security risks, it tipped the scale toward convincing some companies not to be early adopters.
However, the lower costs reduce a lot of the business risks of using AI. In other words, it feels like a good investment to try. As common with most technologies, these costs will continue to decrease.
Using security as a reason not to use AI will probably no longer be a valid excuse. If anything, it shows that security is inflexible in adjusting risk to adapt to changing conditions. In other words, we, as a security community, should figure out how to support our team’s use of AI rather than finding ways to restrict or disallow it. Another example is not allowing the use of the Internet in the early days.
It’s also an opportunity for security. I’ve discussed how it could positively affect appsec.
This might be the time for AI security companies. With all new products, security is likely not the first thought. For example, with DeepSeek, security didn’t seem like a top priority. Recent research found that their models are also more vulnerable to prompt injection and jailbreaking. However, this shouldn’t be a reason not to use these models because these aren’t fundamental problems with the model itself. It’s similar to saying that we should use AWS because it can be hacked, or that we shouldn’t write code because it’s possible to write vulnerable code. All tools need some form of guardrail. In fact, it’s actually an opportunity for security to be more involved in the AI product.
AI companies also spend large amounts of money and compute power training these models. However, it seems that it’s easy to “steal” much of this training through distillation. What is distillation? Ben Thompson explains it well,
Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.
Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-
4o
quality.
Although it seems hard to stop distillation overall, this is an area where security has a lot of expertise.
It also means that security has to ramp up on AI. I know this might be tough, and security has been asked to ramp up on a lot of new technologies recently. However, AI is probably the most transformative technology since the creation of the internet. Security teams need to realize that AI will happen with or without them. That is, I could imagine executive teams replacing security leaders who try to restrict AI usage with those who are willing to enable and manage it.
To do this, security teams need to understand the interworkings of LLMs and bring up specific technical concerns and solutions rather than broad risks. For example, most security people quickly jumped to the fact that DeepSeek is insecure and might leak information without distinguishing between DeepSeek-hosted, other cloud-hosted, and the open-source version. It’s also unclear if these certain problems, e.g. prompt injection attacks, are fundamental problems that all models face or if they are DeepSeek-specific. How fixable are these issues, and how do they affect performance? There are a lot of security risks that lack technical depth like this:
Sure, there’s always some risk, but these risks are vague and lack business context on whether these risks are worthwhile. If security wants to be closer to the business, this is our opportunity. We’re at an important crossroads in AI, and security teams have to choose a path.