How can AI be made more secure and trustworthy?
While we’re still debating whether and how long it will take to reach singularity and superintelligence, artificial intelligence is playing an increasingly important role in our everyday lives. Artificial intelligence – most commonly machine learning (ML) – is the process of training algorithms using data, instead of explicitly programming them. Such algorithms are already being used in applications ranging from HR to finance and transport to medicine, and in use cases almost too numerous to mention.
The benefits of machine learning are obvious: they enable faster analysis of vastly more data than any human or even groups of humans are capable of. Many ML applications that can surpass human capabilities already exist, such as those designed to play Go and Chess, or detect fraudulent insurance claims. Unlike past AI boom-and-bust cycles, we’re unlikely to see the return of an AI winter. Current ML algorithms are generating enough value to justify continued research and funding. AI is here to stay – and set to be more pervasive in both industry and our personal lives.
However, one hurdle still exists on the path to true AI success – trust. How can we trust AI when we’ve seen it make so many poor decisions?
Obstacles to reaching AI’s full potential
At the risk of oversimplifying the situation, I believe that there are just two fundamental aspects that must be addressed before AI can reach its full potential. Those are a proper understanding of what AI is capable of and how it should be used, and improvements to the security of AI.
To understand how machine learning works and how to use it properly, it is important to bear in mind that although some ML models are very complex, the systems incorporating ML are still just a product of combining an understanding of a domain and its data. Most current ML methods are designed to simply fit arbitrarily complex models to data based on some optimization criteria. The way these algorithms fit into data can sometimes cause the model to learn things that are not actually important, but simply good for solving the optimization problem on that particular data set.
While training data is important when considering the performance of a model, the model is only as representative and balanced as its data. This has a couple of key implications: models rarely extrapolate well to unknown conditions, and model bias can be introduced if it exists in that data. For example, training a model on a dataset containing biased human decisions will result in a model that reflects those biases. There’s no reason to expect that the resulting model would be any less biased than the decisions present in the data it was trained with – the model simply learns to replicate what is in the data. The research community is making promising advances to look at more generic methods of ML, combining knowledge of the problem and the actual data, but even the current problems are often not a flaw in the process of machine learning itself – it’s just that that technology is often used without any understanding of its limitations, which can lead to undesired consequences.
The unique value that ML brings is its ability to learn from data. This also happens to be its unique weakness from a security perspective. We know that sending unexpected inputs to even deterministic “classic” computing systems can cause unexpected behaviours. These unexpected behaviours often lead to the discovery of exploitable vulnerabilities and are the reason why methods like proper input validation and fuzzing are so valuable in testing.
When unexpected behaviours are found in traditional code, they can be fixed by editing the code. However, when unexpected behaviours are found in ML models, fixes cannot as easily be made by hand-editing, but precautions need to be taken elsewhere. Since ML systems are used in an increasing number of high-value use cases, there is a growing incentive for adversaries to find and exploit vulnerabilities inherent in those systems.
Many ML models are retrained periodically. Retraining is needed to, for instance, keep a model up-to-date based on the latest behaviours of a target audience, ensure that a system makes the best possible recommendations when a new video or song gains popularity, or enable a security system to detect new threats.
But the model retraining process itself enables attack vectors that work even when an adversary has no direct access to the system running the model. Such attacks could simply manipulate the model’s externally sourced input data. These are threats that current security solutions are not equipped to handle since they do not involve the identification of malicious files, detection of breaches in traditional computer security defences, or the detection of the presence of a malicious actor in a system or network. Even classic input data validation approaches such as outlier detection and tracking of distributions in data over time are often not able to detect such attacks, since the best way to manipulate an ML model is usually to make very subtle changes to its input data.
Understanding attacks on machine learning models
There are several different ways to attack ML models and can be categorized in multiple ways. These include model evasion, model poisoning and confidentiality attacks. Let’s take a closer look at each of these categories to understand what they mean and how defences might be implemented.
Model evasion attacks rely on tricking a model into incorrectly classifying a specific input, or evading anomaly detection mechanisms. Examples of model evasion include altering a malicious binary so that it is classified as benign or tricking a fraud detection algorithm into not detecting a fraudulent input.
Although model evasion attacks can cause great harm, they are perhaps the least severe of the types of attacks discussed here. Model evasion attacks allow an adversary to misuse a system, but they do not alter the behaviour of the attacked ML model for future inputs nor expose confidential data. Model evasion attacks essentially exploit the fact that decision boundaries in the model are very complex and the capability of the model to interpolate between samples is limited, in a way leaving “gaps” to be utilized for.
Model poisoning, on the other hand, aims to change the behaviour of a model so future samples are misclassified. An example is to provide malicious inputs to a spam classification model to trick it into incorrectly classifying emails. This can be achieved in systems that allow users to label email as spam.
To understand how this attack works, consider the fact that model training processes are designed to find an optimal decision boundary between classes. When a sample appears on the “wrong” side of the decision boundary, the algorithm aims to correct this by moving the decision boundary so the sample is where its label indicates it should be. If a sample in the training data is purposefully mislabelled, it will cause the decision boundary to move in the wrong direction. This subsequently can lead to future adversarial samples not being recognized or benign samples being classified as malicious.
Many models that contain a high level of company intellectual property or are trained on sensitive data are open to the world. Confidentiality attacks involve replicating those models (model stealing) and/or revealing data that was used to train those models (model inversion).
To perform a confidentiality attack, an adversary sends optimized sets of queries to the target model in order to uncover the way the model works or to reconstruct the model based on those inputs. These methods can be used to steal intellectual property and gain a possible competitive advantage or to reveal some of the data that was used to train the model.
Fighting threats agains ML models
How can AI be made more secure and trustworthy? The first step is to understand and acknowledge the existence of potential threats. Many threats against ML models are real but ML practitioners don’t necessarily even consider them, since most of the effort used to develop models focuses on the improvement of model performance. Security is, at best, an afterthought, and we must change this and involve security already in the design of ML systems. Even though attacks against ML models are a serious concern, there are plenty of ways to mitigate them.
One way to defend against attacks is to detect, clean or discard potentially malicious samples. Approaches for this vary depending on the application and model type, but generally, the process involves understanding how a model may be harmed in order to detect the types of samples that could cause that harm. An example would be monitoring the distributions of inputs paying extra attention to samples suspiciously close to the model’s decision boundaries but on the misclassified side.
There is often a balance between accuracy and robustness. Simpler models are often more robust, but trade-offs should naturally be considered on a case-by-case basis. There are also techniques like adversarial training that can improve robustness and sometimes even performance as they provide a larger set of training samples to the model. Adversarial training is the process of adding correctly labelled adversarial examples to training data – and while this may not cover all cases, it can certainly help.
Monitoring the outputs of a model actively, using a defined set of tests, provides a baseline that can be effectively used to detect many cases of model poisoning. Such tests allow practitioners to understand and quantify changes that occur during retraining. It is often difficult to distinguish between normal changes in input behaviour and the results of a poisoning attack. This problem is more similar to traditional cybersecurity detection and response than classic preventive methods. We know that attackers are out there, and we know they might be looking to manipulate our models, so we need to acknowledge that our safeguards may never be enough.
In addition to the general attack mitigation approaches mentioned above, methods aimed at protecting model confidentiality such as gradient masking, differential privacy, and cryptographic techniques exist.
Trust can unleash the full potential of AI
By acknowledging that threats exist, applying appropriate precautions, and building systems designed to detect malicious activities and inputs, we can overcome these challenges and make ML more secure than it currently is.
If we want to be able to trust AI, we must – in addition to understanding the capabilities and limitations – be able to have confidence that it cannot be, at least easily, tampered with. But to do that, we need to be aware of the problems and actively mitigate them to truly unleash the full power of AI. AI is a key building block for our future – and we need to be able to trust it to reap the full benefits. Secure AI is truly the foundation for trustworthy AI and a key step in the path to the AI-empowered future.