AI Proxies
Evolution of an existing and long standing market
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.

I’ve intentionally made all of my posts free and without a paywall so that my content is more accessible. If you enjoy my content and would like to support me, please consider buying a paid subscription:
I’m back to writing about the markets that will likely become massive as AI adoption proliferates across enterprises. Last week, we looked at how macro security budgets are fundamentally shifting toward elite generalists. This week, I want to talk about a specific structural primitive that these generalists will use to regain control of their networks: AI proxies.
If you Google the term “AI proxy” today, you mostly find a handful of open-source projects and a couple of high-level articles by legacy gateway players like Kong and Nginx explaining how you can use their technology to intercept AI traffic. In my opinion, these are highly unsatisfying solutions that miss the deeper architectural shift.
Let’s start with the actual problem we want to solve. Yes, it’s shocking that I’m first defining a problem rather than describing a pre-existing market with clean solutions that we are supposed to believe exist even when they aren't based on real first principles. Sorry, I digress.
Lately, social media and the news have been filled with wild stories about what happens when you let autonomous AI agents roam free and vibe code inside a production environment. For example, there was the recent incident with PocketOS, where a Cursor agent mistakenly wiped an entire production database. While this feels like an indictment of basic software engineering posture, i.e., why didn't they have immutable backups or restricted schema privileges in place, it continuously gives traditional security teams more ammunition to slow down AI adoption. You can easily find or anecdotally construct a terrifying story about why we need to put the brakes on LLMs due to misconfigurations and data leaks.
But as I’ve always advocated, trying to block AI is a losing battle. Its adoption is inevitable because it accelerates software development at a scale we have never seen before. You can see it in how fast teams are adopting tools like Claude Code and Codex, forcing a massive competitive race where Anthropic has rapidly closed the gap with OpenAI. Attackers are already leveraging AI daily, so every day a security team spends trying to prevent internal adoption is valuable time lost that should have been spent learning how to make AI work for them. The risk is evolving, and we should spend our cycles adapting to the runtime rather than pretending we can block the API.
To give security teams peace of mind without killing developer velocity, we need to lean on a concept that is actually nothing new: proxies.
Proxies are an age-old architectural tool that security has always used to regulate and govern new technology waves. We used web application firewalls to handle web traffic, and cloud access security brokers (CASBs) to govern SaaS usage. We use proxy-like abstractions from Cloudflare and Akamai to protect modern APIs. The pattern repeats because it works. We desperately need guardrails around AI, and an inline proxy is the most logical way to inject them.
Local vs. Cloud agents: The architectural split
When we look at how AI agents interact with an enterprise, the threat surface splits into two main environments: local agents running on endpoints and cloud agents running inside production infrastructure.
Local agents are the immediate hurdle. These are direct calls to LLM APIs originating from developer laptops via IDE extensions and terminal tools. Cloud agents, on the other hand, operate deeper within your production infrastructure to orchestrate backend workflows. Right now, only SaaS providers and highly advanced engineering organizations are running fully autonomous cloud agents, but they will become standard across the board over time.
Technically, both of these environments are just endpoints making API calls, but history shows us that security platforms struggle to secure both simultaneously. Giants like Crowdstrike and SentinelOne built dominant businesses on laptop endpoints, but struggled to capture the cloud, leaving the door wide open for companies like Wiz to dominate cloud infrastructure security. Cloud and infrastructure workloads operate on entirely different patterns, velocities, and privilege models than a developer’s laptop.
Because of this inherent friction, I imagine the market will split into two separate product types to handle local and cloud agents independently, mirroring the historical divide between traditional endpoint protection and cloud security posture management.
The real technical hurdle: Streaming token latency
But if you want to understand if an AI proxy is legitimate or just a marketing wrapper, you have to look at how it handles the streaming token problem. A viable product in this space must maintain exceptionally low latency while processing a massive volume of concurrent requests.
Traditional web proxies look at static HTTP payloads. They intercept a request, scan the complete block of text for a signature or a social security number, and either block it or let it pass. That model fails completely when applied to LLMs.
AI interactions are heavily reliant on real-time streaming tokens. When a developer uses an autocomplete function in their IDE, the tokens are fed to their screen millisecond by millisecond. If an AI proxy acts like an old-school gateway, i.e., holding the streaming response back until it can inspect the entire paragraph for a security violation, it adds massive latency. If your security tool adds even 200ms of lag to a developer’s terminal, it ruins the interactive experience, and engineers will immediately write a script to bypass it.
A modern AI proxy has to be engineered from the ground up to inspect data streams on the fly. It needs to evaluate context window shifts, scan for prompt injection techniques, and mask secrets dynamically within the token stream without breaking the connection or adding perceptible lag. This is a massive engineering challenge that traditional network architectures simply aren’t built to handle.
However, it’s possible that the developer can tolerate some latency since Claude does take some time to respond.
The infrastructure moat
This exact dynamic explains why companies like Zscaler and Cloudflare became so massively successful in the SWG and CASB worlds. They understood early on that performance is everything. You cannot deliver low latency and handle millions of concurrent requests across global teams if you are renting generic compute or routing through sloppy third-party networks.
Having your own dedicated global infrastructure is the ultimate moat in network security. Netskope spent years trying to route through third-party infrastructure before realizing they had to build their own global network to make the margins and the performance profile make sense. If an AI proxy doesn’t sit on a highly optimized, distributed network, the request throughput will crush it.
When you look at infrastructure access management and identity proxies like Teleport, StrongDM, and a more recent startup called Formal AI, you see a completely different profile. These are highly specialized proxies designed to grant secure access to non-public infrastructure, like a production database or a development box, without exposing them to the raw internet. Unlike SWGs, these proxies don’t handle massive, high-volume web traffic because their footprint is restricted to specific engineering sessions.
Among all of these players, only one has built an explicit product around intercepting and governing AI traffic: Formal AI. (Disclaimer: I am a customer of Formal AI and Cloudflare, and I’ve used Teleport in the past.)
Why haven’t the others jumped on this yet? For Teleport and StrongDM, their current infrastructure setups simply aren’t engineered to handle high-volume, continuous LLM data streams. For legacy giants like Zscaler and Netskope, they are currently blinded by the sheer size of the traditional cloud security market and are optimized for a slower corporate buyer. They are likely going to miss the initial window, which is wild considering they already own the global network infrastructure required to scale AI traffic interception, which will eventually demand handling tokens and prompts at a greater volume than standard web payloads.
The policy moat: programmable security
There is a deeper philosophical issue with the legacy proxy vendors. Almost all of them fail to provide a flexible way for users to define and enforce custom, programmatic policies. They offer rigid, out-of-the-box checkboxes, but they don’t give you a true code-based engine to write your own rules. This shouldn’t surprise anyone who has worked in this space for a while; security teams have historically not been good at programming, and vendors built their UIs to cater to that lack of technical depth.
Formal AI, by contrast, is built on the thesis that the next generation of security engineers will be highly technical practitioners who want to treat policy as code. They treat security engineers like developers who need a programmable proxy to parse, inspect, and mutate AI prompts and responses in real time.
Eventually, the endpoint giants like Crowdstrike and SentinelOne will try to enter this race as well. But right now, their main enterprise customer bases aren’t actively asking for AI proxy guardrails. The market feels small today because security teams are still stuck in a state of paralysis, trying to figure out how to block AI entirely rather than architecting a proxy solution. There simply isn’t enough education or sophisticated marketing around the concept yet.
The path forward for AI proxies
While Formal AI has a clear first-mover advantage, their current architecture requires customers to self-host the proxy. This places a significant operational burden on internal infrastructure teams who have to manage the scaling, availability, and latency of a critical path dev tool. To maintain their lead, they will eventually have to transition to a fully hosted model and invest heavily in their own global cloud infrastructure to survive the raw traffic load.
Building a basic proxy wrapper isn’t rocket science, but building a highly nuanced product that understands semantic context, prompt injection, and data loss prevention at the edge without adding latency to a developer’s IDE is incredibly difficult.
I find it hard to see legacy vendors like Zscaler or Netskope executing well here; their DNA is too corporate and far removed from the developer workflow. Teleport and StrongDM lack the global routing networks. Cloudflare remains the only legacy competitor with a real shot at winning this space because of their native closeness to developers. However, it feels like they have recently lost touch with the new generation of engineers who are building entirely with autonomous agents. Cloudflare seems content to double down on their core CDN and traditional traffic proxying rather than building opinionated runtime controls for AI.
Ultimately, this category will belong to the platforms that understand security engineers are now builders, not auditors. The winning AI proxy won’t just block bad strings; it will act as a highly programmable, low-latency translation layer that allows enterprises to embrace agentic velocity safely.



