Security for AI: A lot is unknown

The problems aren't new, but they aren't solved

Nov 26, 2024

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer or any other entities with which I am affiliated.

Photo by Guilherme Bustamante on Unsplash

I’m still running a 50% off sale for a yearly subscription on my blog for the holidays! If you have been waiting to get a subscription and/or have some remaining professional development budget, this is the time.

Get 50% off for 1 year

Last week, I talked about how to use AI for security and some ideas there. I discussed how AI will almost definitely be an important part of any business going forward, so any security team that hinders its usage will not fare well. So, it’s worthwhile to create some policies and apply them to understand the problem more deeply and lead by example.

How to use AI in security

Frank Wang

November 20, 2024

Read full story

In this newsletter, I discuss the reverse. How should we handle security for AI? There’s a lot of content out there as well as startups, which also means there’s a lot of noise. I don’t claim to know all the answers (or even be right), but I plan to discuss some areas that are on my mind. Of course, I’ll mention some startups and what I think of their products.

What is AI?

This might seem like a silly question, but it’s important to understand the technology you plan to secure. Securing a technology you don’t understand is a common failure mode in security. It’s hard to understand and communicate risk without context. I decided to ask what AI is to ChatGPT, and this is what I got:

Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think, learn, and solve problems like humans. AI systems use algorithms and data to perform tasks that typically require human intelligence, such as understanding natural language, recognizing patterns, learning from experience, and making decisions.

I would say this is pretty accurate. One missing connection that I would call out more explicitly is that it’s not just “algorithms and data",” but rather it uses data to generate models and algorithms to perform tasks and decisions. The data and the resulting models are key components. What does this mean for security? It’s giving a sense of what we have to protect.

Security for AI isn’t completely new

Most of the basic security problems in AI are an extension of current security programs. OWASP has a good guide on this, and out of a few articles I read, Wiz has one of the better blog posts (so you don’t have to sift through the broader, noisy internet yourself). Of course, I also asked ChatGPT about security best practices for AI, and I got this:

Data Protection: Use encryption and strict access control mechanisms to protect the datasets used in AI models.
Model Auditing: Regularly audit AI models for vulnerabilities and biases, and ensure that they are explainable and transparent.
Robust Training Processes: Implement robust training techniques to mitigate the risk of poisoning and adversarial attacks, such as adversarial training and data validation.
API Security: Secure AI APIs with strong authentication and authorization mechanisms, and monitor for suspicious activity.
Continuous Monitoring: Continuously monitor AI systems for anomalies or malicious behavior during deployment.

This is pretty consistent with what others are writing, and if you look closely, this isn’t that different from how we currently secure applications and products. Of course, models are slightly different from code, and the AI systems are slightly different from production infrastructure. However, it doesn’t feel materially different except you also work with data scientists and AI engineers. My point is that, in its current state/maturity, it doesn’t feel like a material difference.

In fact, AI use feels like it’ll be less disruptive to security and engineering compared to SaaS and cloud. Some fundamental changes occurred in those technologies. Data was no longer centralized and now lived on individual applications. Datacenter and physical security became less important because that was outsourced to cloud providers. Finally, engineering velocity increased. Application deploys happen more often, and infrastructure is deployed more easily.

These changed threat models and caused certain tools not to function well. For example, security couldn’t plug in a firewall anymore. Application scanning tools became too slow for agile development. Similarly, the data loss prevention (DLP) tools didn’t make sense anymore because data was no longer centralized on a company’s own infrastructure but instead scattered across cloud environments and SaaS tools. It also changed the software model where SaaS companies now have to bear some infrastructure security responsibility rather than just application security. But, I digress.

What does this mean for tooling?

Again, this is for current use cases, but in the future, this might change as we integrate AI deeper into our applications and operations. Right now, it seems that most tools can extend into AI security, and it could be a differentiator. For example, current API security tools can protect AI APIs. Current data protection tools could ensure that the wrong data doesn’t enter the LLM training. There are tools like Knostic that try to do data protection and access control, but I believe that it isn’t necessary.

Similarly, it’s possible that code-scanning tools and testing tools can identify model bias. However, while building AI models, there are tools to test whether these models are overfitted or too biased in some way. There might be some room to monitor malicious AI behavior, but it’s likely that this integrates into AI tooling rather than be dedicated security tooling.

Unlike the developer/software engineering market, the people working on AI are willing to buy vendors at the infrastructure level. Databricks and Palantir’s valuations are evidence of this trend because traditionally, those who are on AI are part of the data team and have bought tooling to help so that they don’t need any support from developers. In general, the data market is much bigger than both security and developers. As a result, the most valuable security tools/features will be integrated into AI tooling.

My prediction is that we’ll see plenty of AI security tooling pop up because current AI tooling lacks the capability. These tools will be shortlived, and AI tooling will disintermediate them.

What else can a security team do to provide security around AI?

I discussed the need to set policies, but I haven’t said what should go in these policies. Unless it’s necessary, policies are dependent on use cases. The two main areas are the following:

SaaS AI Security: SaaS tools that use AI and what data is being put in them.
Internal AI security: Internally developed AI capabilities and how they are being used.

SaaS AI security is a bit easier to handle because it’s part of the 3rd party vendor review process. A security team should figure out what data the tool is ingesting. Similarly, it should figure out what security features the tool has. There is some security burden on the SaaS provider to deal with security. Like with all SaaS tools, the ones that succeed will have strong security controls and establish trust with the customer. I can imagine there being some AI-specific security certification in the future. Hopefully, it’ll improve security rather than create unnecessary compliance tasks. If data and the task are sensitive, I can see companies looking to develop their own AI capabilities in-house to have more control over the models and how the data is used, but this is no different from how companies currently consider SaaS tools.

Internal AI security is a bit harder because there are multiple components. Right now, there aren’t many best practices, but based on what’s described above, traditional security best practices apply. They are just focused on protecting data and models. Rather than talking about random parts of the system to secure, it’s important to develop a threat model and create some risk thresholds and scenarios. I provide some examples of how to do this in previous newsletters.

Security for developers: Threat Modeling

Frank Wang

March 26, 2024

Read full story

Security risk is hard

Frank Wang

September 24, 2024

Read full story

Security risk is hard (as is the title of one of my newsletters), and it’s especially hard for new areas. However, it’s important to do this because it builds a foundation to iterate and evolve as we, as a community (not just security but also AI), gain a better understanding. Using threat modeling and security risk is especially important here because it’s easy to get caught in FUD to do something without fully understanding the threat, e.g. AI models might leak data, so we need to monitor the outputs. A better way to think about it is that our data is valuable. So, we should avoid feeding sensitive and valuable data to the AI to prevent leakage. Then, it becomes clearer that the preventative measure is a data access control problem.

This isn’t meant to be comprehensive, but these are some initial thoughts I had on AI security. I believe many security companies focused on AI security are premature — they are likely features or products of an AI tool, especially since the data market itself is large. As use cases evolve, there might be space for an AI security company, but right now, it seems like existing security companies can extend their capabilities to help with AI security, which can be a differentiator.

Frankly Speaking

How to use AI in security

Security for developers: Threat Modeling

Security risk is hard

Discussion about this post