How to detect fraudulent activity in a cloud without invading users’ privacy
A group of researchers have found a clever way for cloud providers to detect fraudulent activities in their clouds without actually probing into the kind of activity a user performs, but by using privacy-friendly billing data.
The great thing about the cloud is that companies and users can use as much compute power or storage as needed at a specific moment and pay only for what was used.
However, fraudulent, illegal or undesired activities such as using a cloud infrastructure to launch DDoS attacks or cryptocurrency mining can ruin the experience for those who use the cloud for private and corporate purposes, as the aforementioned undesired activities can continuously suck up too much bandwidth and reduce the lifespan of the hardware.
The problem for cloud providers is the following: how to detect fraudulent or undesired activity on their infrastructure without performing network packet inspection, i.e. invading a paying user’s privacy?
“A way of doing this would be to use data aggregates, which do not give a lot of detail, such as CPU usage or the number of outgoing packets in a closed interval, to perform a first classification,” the researchers explained in a paper. “In case a fraudulent activity is suspected, then a more in-depth method can be used. This way allows users who run regular workloads to keep their privacy while detecting suspicious activity.”
The samples of data were collected from an OpenStack cluster, featuring regular workloads and fraudulent ones. By testing different classification algorithms, the researchers attempted to classify 5 types of jobs: regular workload (hadoop workload or highly CPU-intensive job), internal DDoS attack, cryptocurrency mining, and physical network failure.
Of all the OpenStack components, Ceilometer – the Telemetry Service that provides all the usage metrics cloud providers need to establish customer billing – proved to be the most useful.
By using five seconds data aggregates of several common metrics (CPU, disk and network) during various activities, and comparing the various patterns, they managed to determine – with relatively high accuracy and in a relatively short time – what type of activity customers are engaged in without discovering detailed information about what they are actually doing. Their privacy is thus preserved, and illegal or undesired activities can be made to stop.
The system has its advantages and shortcomings, but the researchers consider it a good first step of a fraudulent activity detection pipeline as more in-depth intrusion detection systems can then be deployed and will have less data to process. Also, the collected data can be reused to bill the customer.