What does optimal software security analysis look like?
In this Help Net Security interview, Kevin Valk, co-CEO at Codean, discusses the consequences of relying solely on automated tools for software security. He explains how these tools can complement human knowledge to enhance software security analysis and emphasizes the need for the security industry to prioritize the symbiotic relationship between humans and machines.
Why is postponing security analysis and tests after the implementation phase problematic? How does it affect the cost and effectiveness of identifying vulnerabilities?
Software development is not much different from building houses or sometimes complex megastructures. To build anything you need a fundament that you can build upon. Let’s say you are building a skyscraper, before you start physically building you need lots of drawings, analysis of forces and preparations and then you can finally start building. Let’s say your skyscraper is pretty tall with a planned 500 meters total height. At around 400m high all work comes to an abrupt stop as there are visible cracks in the foundation (or maybe the used glass on all the floors turns out to be unsafe). Depending on the exact issue the result is either a complete rebuild (foundation was not strong enough to reach to 500m), or expensive as each and every window needs to be replaced. If instead, this was looked at during design or the first few floors it would be obviously way cheaper to resolve. Software security really is not that different, especially now that software development is agile and changes rapidly. You really want to ensure you are not building on an insecure foundation or with insecure materials.
What tasks are best suited for automated tools, and what tasks require human expertise in software security?
The biggest problem in cybersecurity is the assignment of value. Who decides that a specific functionality or data has value? Often, this is decided by the company behind the software in question. For example, banking software needs to keep the full transaction history of each bank account. This information is highly confidential as else you could figure out who has how much money. In stark contrast, Bitcoin requires all transactions to be publicly visible and verifiable. So the confidentiality of transactions between these two cases is completely opposite (Do note that in the case of Bitcoin the confidentiality is deferred because there should be no link between a public key and a party owning it).
Currently all automated tools cannot understand this difference as the tool would need to understand the full business case and future plans of a company. Large Language Models (LLMs) could bridge this gap by providing business relevant knowledge, but often this information is not written down and simply exists in expectations or in the heads of people.
In my eyes software security is about understanding what has value and then finding ways to obtain this value by abusing/altering the software. This will currently (and I expect for a long time) require humans as this information is currently not written down, let alone encoded in an easily digestible format.
While humans are very good at deriving information and making assumptions, most of us are rather bad at performing complex calculations in our heads. Analysis of source code to figure out how software behaves or finding relations between all different components in a piece of software is something computers are getting rather good at. This is exactly the area that in my opinion automated tools should focus on. A human could mark some data structure or piece of code as valuable. Then a computer could try to find ways to get to this data structure of a piece of code and could even keep track of all the steps it had to take to try to get here. Finally, a human could verify if all the steps that the computer took make sense and are even feasible. Hence, I think the security industry should focus more on the symbiosis between man and machine. Not fully focused on automated tools, nor fully focused on human effort.
Can you discuss the challenge of handling false positives generated by automated security tools? What are the consequences of relying solely on automated tools?
This is currently not done and often results in tools that assign value to functionality and code that is not true for the business. This in turn leads to a significant amount of false positives which simply is noise for everybody involved. This “noise” can essentially hide the real problems because now you need to find those few critical vulnerabilities in all false positives. Sadly, we are just human after all and often we end up going numb while combing through the data and missing the real problems.
If on the the hand you simply take all results from automated tools on face value and decide to fix them all, you are in for a bad surprise. Not only will developers have spent significant time trying to fix all the issues. They are probably not taking you seriously anymore because they had to fix some mundane issues in some piece of code that clearly had no business value.
Now we are left with two major problems, the developers do not take security seriously anymore and the management thinks the product is completely secure because tools do not report any security issues. Sadly, the biggest security problems are completely logical as in, it is often semi hidden features that the creator was not fully aware of all the side effects. You are probably aware of the Log4j vulnerability a few years ago, this was exactly such a case that no automated tool could probably ever find.
How have machine learning-based techniques changed or enhanced traditional approaches in software security analysis, specifically in static analysis and fuzzing?
Personally I am not aware of any major breakthroughs currently, but I am sure many companies are jumping on it. As mentioned before, the biggest problem in software security is the assignment of value. What parts need to be protected and how well? I definitely see LLMs play major roles in guiding tools on what has and what does not have value. This holds for both static analysis and fuzzing. Static analysis could be guided to focus on specific business logic or the assign (business) value to code and data that can be used to guide the static analysis. The same holds for fuzzing, but you will require directed fuzzing.
What roles do natural language processing techniques play in analyzing software specifications, especially when written in natural language?
The biggest unsolved question in my opinion is how do we provide AI/LLMs the required information as a lot of information is common sense or only available in the heads of people. Considering once again the confidentiality of transactions in a banking system, I doubt that there currently is a single document that contains this and all other requirements and specifications the banking system needs to uphold. One alternative is to let a LLM start scraping the complete intranet of companies, but I do not expect companies jumping on this opportunity.
Where do you see the field of software security analysis headed, especially with advancements in machine learning and AI?
We are hitting limits on both automated tools and manual labor. Automated tools do not provide enough quality to get software to a “good” security posture. Humans on the other hand are too expensive and there are simply not enough security people to keep up with all development.
One logical solution would be to increase the quality of tools by adding human knowledge and to add tools to humans to increase speed. Currently the interface between man and machine is severely lacking and we need innovation in this. The push on LLMs is definitely causing innovation but a simply mindset change to make tools more human could also have significant results.