AI security posture management will be needed before agentic AI takes hold
As I’m currently knee deep in testing agentic AI in all its forms, as well as new iterations of current generative AI models such as OpenAI’s O1, the complexities of securing AI bot frameworks for enterprise security teams are beginning to crystallize.
The first comparison that comes to mind is what we experienced during the “on-prem to cloud” days, when we were suddenly faced with limitations in our existing security toolsets.
Your on-prem vulnerability scanner, which was used to scan a database or VM server at an IP address, suddenly didn’t know what an AWS S3 bucket or Azure Blob was. It could not understand the context of a Lambda function or even how to scan it, and you were left with “invisible” vulnerabilities because the scanner was effectively blind to these assets. This is how cloud-native application protection platforms (CNAPPs) – tools that could do both the traditional scanning and scanning of cloud assets – came to be.
We now are faced with a similar problem with agentic AI and what will soon be an entire fabric of interconnected agentic AI frameworks across your enterprise.
Let’s break the problem down first
If we take Microsoft’s agentic AI offering (Copilot Studio), which already sits on top of its own flavor of OpenAI’s GPT, the bots can be layered and deployed as standalone web applications, mobile applications, Teams channels or interactive portals embedded in client-facing applications or even social media apps such as Facebook.
They contain their own security configurations, e.g., regarding authentication (Do you authenticate to them? If yes, how?). Then there’s cross-platform triggers and associated actions dependent on user generated content (e.g. Do I call the external application API of platform XYZ if the user says this keyword?). There’s even settings on content moderation that can hamper native AI vulnerabilities like prompt injection.
Anthropic’s implementation of agentic AI – simply called “computer use” – leverages the latest iteration of Claude (Sonnet 3.5) to effectively allow the agent native access to a user environment, right down to browser and file system actions. This, too, has its own settings. For example, you can declare your own “tools” for the bot to use. But if it leverages Bash to run commands, what permissions does it have? What browser is it using for surfing the web? Are its outbound connections “proxified”? Is it using its default Python code for opening and editing system files? As Anthropic pointed out, Claude will occasionally follow instructions in files it fetches or processes that counteract the user’s initial commands, which means it can be “distracted” from its official mission and do something else entirely.
Now start to picture a framework of these bots working together: for example, a Copilot Studio-powered chatbot in the front-end, connected to ServiceNow in the back-end that’s generating tickets that are fed to a ChatGPT API-powered bot, with links to Nvidia’s blueprint agents for local workloads, and Claude interwoven for specific file-based actions. Every bot has different authentication methods (or for the public chatbot, none), different privileges, run with different datasets and connectors, have different triggers and even run on different reasoning models (Sonnet 3.5 by Anthropic “reasons” differently than OpenAI’s O1). How do you monitor or even scan this mess?
Finding solutions
We’ve run into these issues when most companies shifted their workloads to the cloud. Authentication issues – like the dreaded S3 bucket that had a default public setting and that was the cause of way too many breaches before it was secure by default – became the domain of cloud security posture management (CSPM) tools before they were swallowed up by the CNAPP acronym. Identity and permission issues (or entitlements, if you prefer) became the alphabet soup of CIEM (cloud identity entitlement management), thankfully now also under the umbrella of CNAPP.
AI bots will need to be monitored by similar toolsets, but they don’t exist yet. I’ll go out on a limb and suggest SAFAI (pronounced Sah-fy) as an acronym: Security Assessment Frameworks for AI. These would, much like CNAPP tools, embed themselves in agentless or transparent fashion, crawl through your AI bots collecting configuration, authentication and permission issues and highlight the pain points.
You’d still need the standard panoply of other tools to protect you, since they sit atop the same infrastructure. And that’s on top of worrying about prompt injection opportunities, which is something you unfortunately have no control over as they are based entirely on the models and how they are used.
Today, prompt injections are seen as an almost amusing highlight of AI bot use, especially on social media networks. Just look for “ignore previous instructions and…” (insert ASCII unicorn here) on any social media network and you’ll find some hilarious examples of supposed legitimate accounts caught posting AI generated content. These simple examples have largely been neutered by the AI models themselves but will never really go away (in the same vein that WAF bypasses are in constant flux).
Far sooner than we think we’ll be entering the era of large, interconnected AI bot frameworks woven together with disparate APIs where bots themselves will be the causes of data breaches. We need to develop tooling to scan and monitor them at scale across disparate vendors, so they don’t yet become another ghost asset.