Red teaming: The key ingredient for responsible AI

Developing responsible AI isn’t a straightforward proposition. On one side, organizations are striving to stay at the forefront of technological advancement. On the other hand, they must ensure strict compliance with ethical standards and regulatory requirements.

AI red teaming

Organizations attempting to balance this thin line between rapid innovation and increasing regulatory requirements will need to employ a standardized approach to development, ensuring they remain compliant and competitive in an increasingly crowded market.

AI innovation at risk

Many businesses are already struggling to decipher an increasingly tangled knot of regulations, including the (upcoming) Cyber Resilience Act and Data Act.

Although the recent EU AI Act has taken a significant step towards AI safety, the law has also created additional bureaucracy. It has sparked calls from the European Parliament to make compliance with the Act easier by simplifying administration requirements and clarifying grey legal areas. Plus, there are requests for better funding of AI research and support to help small businesses get to grips with the legislation. Without these adjustments to the act, there are genuine concerns that the EU will be unable to establish itself as a front-runner in the field and lose out to the US and China.

The UK government has taken a more pro-innovation stance. Rather than introducing new laws, its AI white paper proposes five high-level principles for existing regulators to apply within their jurisdictions, focusing on safety, fairness, transparency, accountability, and user rights. These broader principles are less prescriptive than the EU’s Act. In fact, they align well with the goals of red teaming, an already trusted ingredient of IT security testing procedures.

AI red teaming: defining and reducing risk, without stifling innovation

To regulate a technology, you must understand it. Part of the challenge with overly rigid regulation is that it assumes we already know how to limit the risks of AI from both a safety and security perspective — but that’s not the case.

We’re still regularly discovering new weaknesses in models from a traditional security perspective, like AI models leaking data, and safety perspectives, like models producing unintended and harmful imagery or code. These risks are still being discovered and defined by the global researcher community so until we better understand and define these challenges, the best course of action is to remain diligent in stress-testing AI models and deployments.

Red teaming exercises are one of the best ways to find novel risk, making them ideal for finding security and safety concerns in emerging technologies like generative AI. This can be done using a combination of penetration testing, time-bound offensive hacking competitions, and bug bounty programs. The result is a comprehensive list of issues and actionable recommendations, including remediation advice.

With this clear focus on safety, security, and accountability, red teaming practices are likely to be considered favorably by regulators worldwide, as well as aligning with the UK government’s vision for responsible AI development.

Another advantage of setting up red teaming as a method of AI testing is that it can be used for both safety and security. However, the execution and goals are different.

For safety issues, the focus is on preventing AI systems from generating harmful information; for example, blocking the creation of content on how to construct bombs or commit suicide and preventing the display of potentially upsetting or corrupting imagery, such as violence, sexual activity, and self-harm. Its aim is to ensure responsible use of AI by uncovering potential unintended consequences or biases, guiding developers to proactively address ethical standards as they build new products.

A red teaming exercise for AI security takes a different angle. Its objective is to uncover vulnerabilities to stop malicious actors from manipulating AI to compromise the confidentiality, integrity, or availability of an application or system. By quickly exposing flaws, this aspect of red teaming helps identify, mitigate, and remediate security risks before they are exploited.

For a real-world indication of its capabilities, the launch of Bard’s Extensions AI feature provides a valuable example. This new functionality enabled Bard to access Google Drive, Google Docs, and Gmail, but within 24 hours of going live, ethical hackers identified issues demonstrating it was susceptible to indirect prompt injection.

It put personally identifiable information (PII) at severe risk, including emails, drive documents, and locations. Unchecked, this vulnerability could have been exploited to exfiltrate personal emails. Instead, ethical hackers promptly reported back to Google via their bug bounty program, which resulted in $20,000 in rewards – and a potential crisis averted.

Talent diversity makes a difference

This quality of red teaming relies on carefully selected and diverse skill sets as the foundation for effective assessments. Partnering with the ethical hacking community through a recognized platform is a reliable way of ensuring talent is sourced from different backgrounds and experiences, with relevant skills necessary for rigorously testing AI.

Hackers are renowned for being curiosity-driven and thinking outside of the box. They offer organizations external and fresh perspectives on ever-changing security and safety challenges.

It’s worth noting that when red teaming members are given the opportunity to collaborate, their combined output becomes even more effective, regularly exceeding results from traditional security testing. Therefore, facilitating cooperation across teams is a key consideration. Getting a blend of individuals with a variety of skills and knowledge will deliver the best results for AI deployments.

Devising the best bug bounty programs

Tailoring the incentive model for an ethical hacking program is vital, too. The most efficient model includes incentivizing hackers according to what is most impactful to an organization, in conjunction with bounties for achieving specific safety outcomes.

Building on the established bug bounty approach, this new wave of red teaming addresses the novel security and safety challenges posed by AI that businesses must address before launching new deployments or reviewing existing products.

Targeted offensive testing that harnesses the collective skills of ethical hackers proficient in AI and LLM prompt hacking will help strengthen systems and processes alike. It will guard against potential vulnerabilities and unintended outcomes missed by automated tools and internal teams. Importantly, it ensures the creation of more resilient and secure AI applications that uphold the principles of “responsible AI.”

Don't miss