The new threat model for AI

It's not about leaks anymore

Jul 16, 2025

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.

I’ve intentionally made all of my posts free and without a paywall so that my content is more accessible. If you enjoy my content and would like to support me, please consider buying a paid subscription:

Support me with a paid subscription

For those who are interested in key services, there’s a cool new one called FOKS, which is a federated protocol for managing cryptographic keys. It’s super cool and applies a lot of cool cryptography, so you should check it out if you’re interested in better key management!

I was having dinner with a friend recently when she asked me, “What’s actually different about securing AI?” It was a deceptively simple question. I’ve written before about the basics: monitor the models, audit the data that goes into them, and make sure you understand how they’re being used. But I realized I hadn’t yet stepped back and asked the deeper question: how does AI actually change the threat model?

Security for AI: A lot is unknown

Frank Wang

November 26, 2024

Read full story

Most security programs still operate under the same assumptions that have guided them for years. They lean heavily on STRIDE and similar frameworks, where information disclosure and denial of service are the primary risks. It’s for good reason because data breaches and outages hurt. Companies worry about losing sensitive customer data, having IP stolen, or being held hostage by ransomware.

But as companies begin to deeply embed AI into their products and operations, the more relevant concern isn’t theft or downtime, but it’s about manipulation.

AI Makes Tampering the Primary Risk

With AI, the goal of the attacker shifts. In many cases, they don’t want to exfiltrate anything. They don’t want to knock your systems offline. They want to stay in the loop by quietly influencing outcomes, nudging model behavior, or sabotaging trust in subtle ways.

This is a fundamentally different kind of risk:

Prompt injections are designed to silently rewrite model behavior without detection.
Training data poisoning that alters future predictions without triggering any red flags.
Output manipulation that degrades decisions without ever crossing a traditional policy boundary.

And because AI systems are inherently non-deterministic, detecting this kind of tampering is significantly harder. There’s no fixed expected output. You can’t just hash the result and flag anomalies. You need context, intent, and provenance.

We’ve Been Chasing the Wrong Signals

In the classic model, we fixate on logs that say “someone downloaded X” or “process Y accessed Z.” But those signals are almost meaningless in the AI context. The AI isn’t just reading a file — it’s interpreting, generating, and acting on inputs. And when those inputs are poisoned, or the outputs subtly biased, it’s not obvious that anything went wrong.

So, what does this mean in practice?

Exfiltration is no longer the final step because it might not happen at all.
The attacker’s goal is persistence, not extraction.
The damage isn’t obvious because it accumulates through degraded decisions.

This is much closer to the insider threat model, except the “insider” is an opaque statistical model that nobody fully understands.

Identity Becomes the New Anchor

If outputs are probabilistic, and attacks aim to stay inside, then the only reliable signal you have is who is doing what and why.

Identity and provenance go from being useful metadata to the primary control surface.

Every action in an AI system, i.e., every prompt sent, every model invoked, every agent task executed, needs to be attributable. It’s not just for audit but for real-time trust.

That means:

Logging prompt chains with origin attribution (user, app, system agent).
Tying model outputs to the inputs and identities that generated them.
Tracking autonomous agents through their full decision chain, including downstream effects and side-channel actions.

In traditional security, this level of traceability was reserved for sensitive systems. In the AI world, every AI system is sensitive, not because of the data it holds, but because of the influence it exerts.

What the New Threat Model Looks Like

If I had to sketch the new AI-centric threat model in one line, it’s this:

The attacker’s goal is to influence decisions from inside the system without being noticed.

That shifts security’s job from defending the perimeter to defending the control loop.

So, rather than asking “Did someone steal something?”, we start asking:

Was this AI output appropriate for the context?
Was this prompt injection successful?
Did this agent's behavior deviate from expected policy or intent?
Was this model fine-tuned in an authorized, verifiable way?

And to answer those questions, we need two things security hasn’t historically been great at:

Fine-grained behavioral baselining, even in non-deterministic systems.
Action attribution, especially across complex chains of input/output transformations.

What Needs to Change

We need to rethink how we secure systems that behave probabilistically, act autonomously, and interact via natural language. A few concrete shifts:

Move from exfiltration monitoring to intent monitoring. A prompt that subtly alters an AI agent’s behavior is just as dangerous as an API key leak, but won’t show up on any DLP dashboard.
Invest in identity and provenance infrastructure. Audit trails for AI need to capture not just the what, but the why. That means building attribution layers into LLM usage, data pipelines, and agent orchestration frameworks.
Shift detection focus to early-stage compromise. If the attacker’s goal is to stay in, then detection needs to happen during injection, not just when the damage is done. This requires monitoring at ingestion points, model interaction boundaries, and even within output patterns.

Rethinking the Economics of Detection and Prevention

All of this, i.e., identity tracking, intent monitoring, behavior auditing, sounds great in theory. But let’s be real: it’s expensive.

If AI systems require deeper logging, more granular attribution, and earlier-stage detection, that means retaining more data and instrumenting more infrastructure than most security teams are used to. And that introduces a new challenge: how do you justify the cost of this to an executive team that still thinks in terms of ransomware and breach reports?

It’s tricky because the economics of AI threats are still immature. We don’t have the same clean headlines or postmortems as we do for traditional data breaches. There’s no public catalog of “prompt injection incidents that led to business loss.” That means ROI for detection and response tooling becomes harder to prove, just as its cost goes up.

So what happens?

We’re likely going to see a rebalancing of security investment strategy. For the past decade, the pendulum has swung heavily toward detection and response. Companies invested in SIEMs, MDRs, playbooks, and alert pipelines because breaches were obvious and the response was measurable.

But with AI, where compromise may be invisible, that balance doesn’t hold.

We’ll need to invest more in prevention again at the input boundary, at the data layer, and within model access control. Guardrails, sanitizers, and verification tools may not stop everything, but they might be the only cost-effective option in the near term.

In other words, we’re re-entering a prevention-first security era, not because it’s better, but because it’s more operationally scalable when manipulation is cheap and detection is unclear.

The smartest teams will:

Log aggressively but with architectural awareness of cost and cardinality.
Invest in explainability and trust layers, so when things go wrong, they can prove how.
Pilot prevention measures now, while the stakes are still manageable—and before manipulation becomes the new breach headline.

Final Thought

Security has always lagged a bit behind how systems evolve. But AI is accelerating that lag. It’s not just a new layer to protect, but it’s a new kind of threat altogether.

We’re not trying to keep data in. We’re trying to keep decisions trustworthy.

That means we can’t just keep reacting, but we need to start building again.

Security needs to build again

Frank Wang

March 13, 2024

Read full story

It’s becoming clear that part of my broader thesis is starting to play out: security teams need to get more technical. They need to deeply understand AI systems, not just how to audit them, but how to instrument, shape, and secure them from first principles. That requires engineering investment, not just process or tooling.

Detection and response won’t go away, but prevention has to rise again. And the only way to strike the right balance between the two, especially in a world of opaque models, autonomous agents, and probabilistic behavior, is to design and build the right foundations ourselves.

Security has become reactive by default. But the only way to be proactive, especially in this new era, is through engineering.

And that’s where the next wave of security leadership will need to live.

Frankly Speaking

Security for AI: A lot is unknown

Security needs to build again

Discussion about this post