Detecting Trojan attacks against deep neural networks
A group of researchers with CSIRO’s Data61, the digital innovation arm of Australia’s national science agency, have been working on a system for run time detection of trojan attacks on deep neural network models.
Although it has yet to be tested in the text and voice domain, their system is highly effective when it comes to spotting trojan attacks on DNN-based computer vision applications.
What are deep neural networks?
Artificial neural networks (ANNs) are computational models based on a large collection of artificial neurons, i.e., mathematical functions that perform functions similar to those performed by brain neurons. They were initially intended to mimic the way a human brain solves problems but, over time, they began being used to perform specific tasks.
Deep neural networks (DNNs) are artificial neural networks with multiple hidden layers between the input layer and output layers.
Artificial neural networks are used within machine learning. They are a framework that allows machine learning algorithms to work together and process complex data inputs. They don’t have to be “programmed” for tasks: they are trained through input examples (datasets). That also means that they will be as good (or bad) as the data they are trained with.
The danger: Manipulated datasets
“Machine learning models are increasingly deployed to make decisions on our behalf on various (mission-critical) tasks such as computer vision, disease diagnosis, financial fraud detection, defend against malware and cyber-attacks, access control, surveillance and so on. However, the safety of ML system deployments has now become a realistic security concern,” the researchers noted.
“In particular, ML models are often trained on data from potentially untrustworthy sources. This provides adversaries with opportunities to manipulate training datasets by inserting carefully crafted samples. Recent work has shown that this type insidious poisoning attacks allows adversaries to insert backdoors or trojans into the model.”
Poisoned datasets can lead to serious adverse consequences, as explained in this recent announcement by the Intelligence Advanced Research Projects Activity (IARPA), which will soon begin seeking innovative solutions for the detection of Trojans in AI.
The researchers’ solution
The researchers developed STRIP (STRong Intentional Perturbation) to detect run-time Trojan attacks on vision systems.
“One distinctive feature of trojan attacks on vision systems is that they are physically realizable. In other words, the attack method is simple, highly effective and robust and easy to realize by, for example, placing a trigger on an object within a visual scene,” they explained.
Even though triggers can usually be easily spotted by humans, ML models will have trouble detecting them.
STRIP works by adding strong perturbations into an input fed into the ML model to detect trojaned inputs.
“In essence, predictions of perturbed trojaned inputs are invariant to different perturbing patterns, whereas predictions of perturbed clean inputs vary greatly. In this context, we introduce an entropy measure to express this prediction randomness. Consequently, a trojaned input, that always exhibits low entropy, and clean input, that always exhibits high entropy, can be easily and clearly distinguished,” they noted.
They validated the STRIP detection method on two commonly used public datasets, but noted that it will have to be evaluated on potential upcoming variants of trojan attacks.
It also remains to be seen whether it will applicable and work as well in the text and voice domain.