In the era of AI, standards are falling behind
According to a recent study, only a minority of software developers are actually working in a software development company. This means that nowadays literally every company builds software in some form or another.
As a professional in the field of information security, it is your task to protect information, assets, and technologies. Obviously, the software built by or for your company that is collecting, transporting, storing, processing, and finally acting upon your company’s data, is of high interest. Secure development practices should be enforced early on and security must be tested during the software’s entire lifetime.
Within the (ISC)² common body of knowledge for CISSPs, software development security is listed as an individual domain. Several standards and practices covering security in the Software Development Lifecycle (SDLC) are available: ISO/IEC 27024:2011, ISO/IEC TR 15504, or NIST SP800-64 Revision 2, to name some.
All of the above ask for continuous assessment and control of artifacts on the source-code level, especially regarding coding standards and Common Weakness Enumerations (CWE), but only briefly mention static application security testing (SAST) as a possible way to address these issues. In the search for possible concrete tools, NIST provides SP 500-268 v1.1 “Source Code Security Analysis Tool Function Specification Version 1.1”.
In May 2019, NIST withdrew the aforementioned SP800-64 Rev2. NIST SP 500-268 was published over nine years ago. This seems to be symptomatic for an underlying issue we see: the standards cannot keep up with the rapid pace of development and change in the field.
A good example is the dawn of the development language Rust, which addresses a major source of security issues presented by the classically used language C++ – namely memory management. Major players in the field such as Microsoft and Google saw great advantages and announced that they would focus future developments towards Rust. While the standards mention development languages superior to others, neither the mechanisms used by Rust nor Rust itself is mentioned.
In the field of Static Code Analysis, the information in NIST SP 500-268 is not wrong, but the paper simply does not mention advances in the field.
Let us briefly discuss two aspects: First, the wide use of open source software gave us insight into a vast quantity of source code changes and the reasoning behind them (security, performance, style). On top of that, we have seen increasing capacities of CPU power to process this data, accompanied by algorithmic improvements. Nowadays, we have a large lake of training data available. To use our company as an example, in order to train our underlying model for C++ alone, we are scanning changes in over 200,000 open source projects with millions of files containing rich history.
Secondly, in the past decade, we’ve witnessed tremendous advances in machine learning. We see tools like GPT-3 and their applications in source code being discussed widely. Classically, static source code analysis was the domain of Symbolic AI—facts and rules applied to source code. The realm of source code is perfectly suited for this approach since software source code has a well-defined syntax and grammar. The downside is that these rules were developed by engineers, which limits the pace in which rules can be generated. The idea would be to automate the rule construction by using machine learning.
Recently, we see research in the field of machine learning being applied to source code. Again, let us use our company as an example: By using the vast amount of changes in open source, our system looks out for patterns connected to security. It presents possible rules to an engineer together with found cases in the training set—both known and fixed, as well as unknown.
Also, the system supports parameters in the rules. Possible values for these parameters are collected by the system automatically. As a practical example, taint analysis follows incoming data to its use inside of the application to make sure the data is sanitized before usage. The system automatically learns possible sources, sanitization, and sink functions.
Back to the NIST Special Papers: With the withdrawal of SP 800-64 Rev 2, users were pointed to NIST SP 800-160 Vol 1 for the time being until a new, updated white paper is published. This was at the end of May 2019. The nature of these papers is to only describe high-level best practices, list some examples, and stay rather vague in concrete implementation. Yet, the documents are the basis for reviews and audits. Given the importance of the field, it seems as if a major component is missing. It is also time to think about processes that would help us to keep up with the pace of technology.