Securing AI’s new frontier: Visibility, governance, and mitigating compliance risks

In this Help Net Security interview, Niv Braun, CEO at Noma Security, discusses the difficulties security teams face due to the fragmented nature of AI processes, tools, and teams across the data and AI lifecycle.

Braun also shares insights on how organizations can address these challenges and improve their AI security posture.

data AI lifecycle

How is the growing AI model sprawl impacting governance, and what strategies are being implemented to mitigate compliance risks?

The new focus on AI has spotlighted the underlying lifecycle of processes, tools, and technology teams use and build to harness the power of AI within their applications. The data engineering and machine learning teams that operate this data and AI lifecycle are completely separate from software development teams, and AppSec teams have no visibility into their work—but they’re still responsible for securing it.

Because traditional AppSec tools (SAST, DAST, SCA, API Security, WAF) do not support this new lifecycle, AppSec teams are unable to govern acceptable use or protect against new risks from data and AI supply chain misconfigurations and vulnerable open source models to GenAI threats. The first step to implementing governance controls and securing the new AI attack surface is visibility.

With AI workloads spread across multiple cloud providers (Azure, AWS, GCP), what are the critical challenges in maintaining a strong AI security posture?

This is a great question, but it goes beyond cloud infrastructure. While some CSPMs are rolling out solutions to secure AI in the cloud, the AI attack surface extends beyond cloud-hosted services. Many components are self-hosted or SaaS-based, requiring bespoke solutions to get visibility into the unique development environments (i.e., Jupyter Notebooks), data pipelines and MLOps tools (i.e., Databricks, Airflow), open-source components (i.e., from Hugging Face), AI-as-a-service usage (i.e., OpenAI), and AI models in runtime.

This sprawl of tools, components, and risks that spans the different phases of the data and AI lifecycle—from data curation and training to model development and operation—coupled with the lack of comprehensive security coverage makes it very challenging for security teams to maintain a strong AI security posture.

How can organizations protect the data used in AI inference and training, especially when dealing with sensitive information?

Securing and governing the use of data for AI/ML model training is perhaps the most challenging and pressing issue in AI security. Using confidential or protected information during the training or fine-tuning process comes with the risk that data could be recoverable through model extraction techniques or using common adversarial techniques (i.e., prompt injection, jailbreak). Following data security and least-privilege access best practices is essential for protecting data during development, but bespoke AI runtime threat detection is response is required to avoid exfiltration of data via model responses.

Another concern when it comes to data and AI is the use of data for training. More and more customers want to know whether their data is being used for model training purposes, which is currently very challenging for most organizations to understand. End-to-end data and model lineage tracking through ML-BOMs across the entire data and AI lifecycle—coupled with granular policies and automated enforcement—is the only way to govern training data usage.

How should organizations approach post-deployment AI system monitoring, and what mechanisms are essential for responding to AI-related incidents?

Securing AI applications in production is equally important as securing the underlying infrastructure and is a key component of maintaining a secure data and AI lifecycle. This requires real-time monitoring of both prompts and responses to identify, notify, and block security and safety threats.

A robust AI security solution prevents adversarial attacks like prompt injection, masks sensitive data to prevent exfiltration via a model response, and also addresses safety concerns such as bias, fairness, and harmful content. It should also allow you to configure and enforce granular, application-specific safety guardrails to prevent production models from violating organizational policies and prevent off-topic sessions such as discussions about competitors or giving financial or medical advice.

What are the potential risks of using open source AI models from sources like Hugging Face?

Data engineering and machine learning teams use open source models or the same reasons that software developers use open source packages/dependencies: to minimize boilerplate code and speed up development.

The open-source AI ecosystem is starting to explode with platforms like Hugging Face enabling easy sharing and downloading of models and datasets. However, these components can have similar risks to open source packages. Vulnerable or malicious models can make their way into source code, Notebooks, data pipelines, or elsewhere across the data and AI lifecycle, giving bad actors unauthorized access to sensitive data, systems, and IP, remote code execution, or backdoors.

Don't miss