Machine learning in information security: Getting started
Machine learning (ML) technologies and solutions are expected to become a prominent feature of the information security landscape, as both attackers and defenders turn to artificial intelligence to achieve their goals.
“The advent of machine learning in security comes alongside the increased capability for collecting and analyzing massive datasets on user behavior, client characteristics, network communications, and more. As we have already witnessed in many other technological domains, I think machine learning will become the main driver for innovation in information security in the coming decade,” says security researcher Clarence Chio.
Alongside Anto Joseph, a security engineer at Intel, Chio is scheduled to give Hack In The Box attendees a quick and practical introduction to the world of machine learning in April.
Machine learning is no silver bullet
But, he says in advance, machine learning is no silver bullet. And contrary to what some security marketing departments may claim, there are still many domains in which non-adaptive and non-learning methods perform better than machine learning techniques, and many reasons why someone might choose heuristics over machine learning to solve a problem.
“For instance, explainability of machine learning predictions is an important area of research that has to develop further before machine learning can see ubiquitous practical adoption. In the case of a typical web application firewall (WAF), it is often simple to explain why a particular request is blocked. However, in the case of a WAF powered by machine learning, it can sometimes be difficult to explain why a model classifies a request as an attack, especially if the model changes over time and has fuzzy decision boundaries,” he pointed out.
Another problem is auditing – good security systems should have a clear and comprehensive audit trail, and many security machine learning systems today don’t provide that.
Then comes the problem of security. “How susceptible are machine learning systems in a malicious environment? Is it possible to train models that are inherently resilient to adversarial samples or model poisoning? These and other questions have to be answered before we can deploy machine learning in mission-critical scenarios such as in infosec,” he says.
Slow introduction is a must
All the above mentioned issues are the reason while it is still not a good idea to ditch existing systems altogether in favor of machine learning. At the beginning, machine learning should be made to work alongside existing technologies.
“As the sophistication of attackers increase, early integration of such capabilities into your network is critical for more comprehensive attack prevention, detection, and remediation. Broadly speaking, machine learning solutions typically provide higher detection ratios at a cost of increased false positives. Combining this with more conservative rule-based systems can help to increase both the reliability and coverage of your defenses,” Chio notes.
“Enterprise CISOs should be liberal in adopting machine learning solutions. Dealing with machine learning systems and predictions is an important and inevitable skill that technologists and analysts need to have,” he advises.
Machine learning can be effectively used as a method for discovering facets of one’s data for generating better rules. When combined with human intelligence in security operations centers, it can be used to take the load of menial tasks such as incident triaging and log mining, allowing analysts to focus on aspects of the job that machine intelligence isn’t so good at.
The thing is, machine learning lacks the rich contextual, environmental, and experiential knowledge that humans have, and it’s generally bad at drawing correlations between vast, unrelated fields.
What it can be used for is to mine for patterns and discover latent trends in data. Also, machine learning security systems can be used to find flaws in existing technologies. For instance, Generative Adversarial Networks (GANs) have been used to find flaws and loopholes in a system’s security posture.
Getting started
Machine learning education is very easy to obtain, says Chio. There are various massive open online courses (MOOCs) through which aspiring students can obtain a complete suite of skills necessary to get started in machine learning.
He himself is currently working on improving the quality of available material by co-authoring an O’Reilly book titled “Machine Learning and Security” that is scheduled to be released in late 2017. The book will focus on the many practical methods for using machine learning in security, as well as on concerns tied to that use.
But it’s good to note that learning about the theory behind support vector machines is very different from actually knowing how to use it in real projects.
Chio encourages all who are interested in getting started with machine learning to dive in and implement something.
“Build a spam detector or a malware classifier from scratch. Consider the different options that you have at each step of the way in building such systems, and the tradeoffs that you make choosing one model over another. Only then will one be able to get a wholesome understanding of machine learning,” he notes.
To help you on your path, there is a wide range of tools and frameworks that you can use to play with machine learning.
“Scikit-learn is a good and complete toolkit for general purpose machine learning,” Chio points out. “NLTK is the de facto natural language processing framework. TensorFlow (or Theano – a higher level of abstraction) is a good way to get started with deep learning. Finally, if you come from a statistics or scientific background, R sometimes provides a lot more complete and convenient ways for data and model manipulation compared to the alternatives.”
For more information about the Practical Machine Learning in InfoSecurity lab session at HITBSecConf go here. If you attend, you’ll get a quick introduction to ML concepts and will be up and running with the popular machine learning library, sci-kit learn.
Check out the complete agenda for the conference, it’s a highly recommended event Help Net Security will be attending as well.