Top LLM vulnerabilities and how to mitigate the associated risk
As large language models (LLMs) become more prevalent, a comprehensive understanding of the LLM threat landscape remains elusive. But this uncertainty doesn’t mean progress should grind to a halt: Exploring AI is essential to staying competitive, meaning CISOs are under intense pressure to understand and address emerging AI threats.
While the AI threat landscape changes every day, there are a handful of LLM vulnerabilities that we know pose significant risk to enterprise operations today. If cyber teams have a strong grasp on what these vulnerabilities are and how to mitigate them, enterprises can continue innovating with LLMs without taking on undue risk.
1) Prompt and data leakage
With LLMs, the possibility of data leaks is a real and growing concern. LLMs can be “tricked” into disclosing sensitive enterprise or user information, leading to a host of privacy and security concerns. Prompt leaks are another big issue. If a malicious user gets access to the system prompt, a company’s intellectual property could be compromised.
Both vulnerabilities are associated with prompt injection, an increasingly popular and dangerous hacking technique. Both direct and indirect prompt injection attacks are becoming common and are paired with significant consequences. Successful prompt injection attacks can lead to cross-plugin request forgery, cross-site scripting and training data extraction, each of which put company secrets, personal user data and essential training data at risk.
With that, enterprises need to implement a system of checks throughout the AI application development lifecycle. From sourcing and processing data to selecting and training the application, every step should bake in limitations that lower the risk of a breach. Routine security practices such as sandboxing, whitelisting and API gateways are just as valuable (if not more) when dealing with LLMs. Beyond that, teams should be carefully vetting all plug-ins before integrating them with an LLM application, and human approval should remain essential for all high-privilege tasks.
2) Compromised model performance
The effectiveness of AI models hinges on data quality. But throughout the model development process—from pre-training, to fine-tuning and embedding—training datasets are vulnerable to hackers.
Most enterprises leverage third-party models where an unknown person manages the data, and cyber teams can’t blindly trust that the data hasn’t been tampered with. Regardless of whether you use a third-party or owned model, there will always be a risk of “data poisoning” by bad actors, which can have a significant impact on model performance and subsequently harm a brand’s reputation.
The open-source AutoPoison framework provides a clear overview of how data poisoning can impact a model during the instruction tuning process. Additionally, below are a series of strategies cyber teams can implement to mitigate risk and maximize AI model performance.
- Supply chain scrutiny: Scrutinize the supply chain to verify that data sources are clean with airtight security measures. Ask questions such as “How was the data collected?” and “Were proper consent and ethical considerations taken into account?” You can also inquire about who labeled and annotated the data, their qualifications, and whether any biases or inconsistencies are present in the labels. Additionally, address matters of data ownership and licensing, including who owns the data and what the licensing terms and conditions are.
- Data sanitization and scrubbing: Be sure to check all the data and sources before they go into the models. For instance, PII must be redacted before putting it into the model.
- Red team exercises: Conduct LLM-focused red team exercises during the testing phases of the model’s lifecycle. Specifically, prioritize testing scenarios that involve manipulating the training data to inject malicious code, biases, or harmful content, and employ a diverse range of attack methods, including adversarial inputs, poisoning attacks, and model extraction techniques.
3) Compromised interconnected systems
Advanced models like GPT-4 are often integrated into systems where they communicate with other applications. But anytime there’s an API involved, there’s a risk to downstream systems. This means one malicious prompt can have a domino effect on interconnected systems. To reduce this risk, consider the following:
- If the LLM is allowed to call external APIs, request user confirmation before executing potentially destructive actions.
- Review LLM outputs before disparate systems are interconnected. Check them for potential vulnerabilities that could lead to risks like remote code execution (RCE).
- Pay particular attention to scenarios in which these outputs facilitate interactions between different computer systems.
- Implement robust security measures for all APIs involved in the interconnected system.
- Use strong authentication and authorization protocols to protect against unauthorized access and data breaches.
- Monitor API activity for anomalies and signs of suspicious behavior, such as unusual request patterns or attempts to exploit vulnerabilities.
4) Network bandwidth saturation
Network bandwidth saturation vulnerabilities can be exploited by attackers as part of a denial-of-service (DoS) attack and can have a painful effect on LLM usage costs.
In a model denial of service attack, an assailant engages with the model in a manner that excessively consumes resources, such as bandwidth or system processing power, ultimately impairing the availability of the targeted system. In turn, enterprises can expect service degradation and a sky high bill. Because DoS attacks are not new to the cybersecurity landscape, there are several strategies that can be utilized to defend against model denial of service attacks and reduce the risk of rapidly rising costs.
- Rate limiting: Implement rate limiting to prevent your system from being overwhelmed by an excessive number of requests. Identifying the right rate limit for your application will depend on model size and complexity, hardware and infrastructure, and the average number of requests and peak usage time.
- Character limits: Set limits on the number of characters a user can include in a query to shield your LLM-based API from resource exhaustion.
- Framework-provided methods: Leverage methods provided by framework providers to fortify defenses against attacks. For instance, if you’re using LangChain, consider utilizing the max_iterations parameter.
Safeguarding LLMs requires a multifaceted approach, involving careful consideration of data handling, model training, system integration, and resource usage. But by implementing the recommended strategies and staying vigilant, enterprises can harness the power of LLMs while minimizing the associated risks.