Q&A: Web application scanning
Mike Shema is the Web Application Security Engineer at Qualys. In this interview, he discusses the challenges related to effective Web application scanning, the way a Web application product adapts to new attack vectors, security at the developer level, current and future threats.
What are the most significant challenges related to effective Web application scanning?
Running effective scans against a web application requires overcoming several challenges:
- Comprehensive crawling
- Keeping it simple while being accurate
- Automate the impossibile.
Comprehensive crawling
Web applications use a wide range of technologies that, from the browser’s perspective, boil down to a mix of HTTP protocol and HTML. A web scanner must be able to interact with the web site in the same way that a browser will. It is equally important that a scanner be able to manage different views of a web site, for example crawling as an unauthenticated or authenticated user.
A web application scanning product will have to follow any technology embedded in the web application, be it Flash apps, Java Applet or ActiveX control. The primary goal of the crawler is to reach optimum coverage of the web application’s functionality. This doesn’t always mean that every link has to be visited – many links have redundant functionality whereby only a representative sample need be tested. In the end, if the product can’t or doesn’t crawl a link, then that leaves a point of uncertainty about whether a vulnerability exists or not.
Web application scanning requires more complicated testing decisions than network and host vulnerability scanning. To put things in perspective, say you’re scanning a single Internet-facing host. Perhaps the host has 50 open ports (though often the number is much less) and on average each port requires testing for 1,000 known vulnerabilities. This implies that there will be roughly 50,000 requests (1 host x 50 ports x 1000 checks) to complete a vulnerability scan against a single IP address. The scan could be optimized based on attributes like OS version, service banners, and date of the vulnerabilities (if you didn’t apply patches from 2007, it’s less likely patches from 2008 and 2009 have been applied). This approach might reduce the number of requests by 90% or more to something like 5,000 which is an excellent optimization rate.
Web applications pose a different problem because they are a melange of custom code, programming languages, and design patterns. Imagine a very simple banking web site. This simple site has a few dozen pages, say 50. Each page has on average 5 parameters between the query string and form fields. This scenario has already established at least 6,000 combinations to be tested (5 parameters produces 120 combinations per page times 50 pages). Just running a single cross-site scripting (XSS) payload could require 6,000 requests; this number is already greater than a scenario posed for network scanning. The total number of tests to apply to each parameter can number in the dozens – or even approach hundreds for exhaustive tests of encoding and obfuscating techniques for vulnerabilities like SQL injection and XSS. Trying to apply 100 different parameter manipulation tests to the site might require 600,000 requests.
Of course, web application scanning offers many areas for optimizing tests so that a small web site doesn’t have to translate into half a million requests. Even so, a 90% optimization rate puts our imagined web scan at 60,000 requests. The real challenge of this situation is that the scanner has to adapt on the fly as it crawls the site and learns more about its links and forms. There are far fewer shortcuts available like looking for OS banners or version information. Web sites behave very differently from each other even if they have similar database engines or are written in the same programming language. The web scanner must figure out how to optimize the scan without sacrificing too much accuracy or missing vulnerabilities.
Given enough time an exhaustive scan of a site’s query parameters, form fields, and cookies could be completed. This might require a few hours or a few days, but there’s nothing inherently preventing a scanner from running through every combination required.
Yet an exhaustive scan will not address every possible vulnerability. Automated scanners aren’t very good at understanding the purpose of a web site or gaining a holistic view of the application. A web-based forum has a different set of security requirements for its workflows than an e-commerce site or an auction site or e-mail, news, and so on. These workflows represent the site’s logic and are supposed to prevent users from making mistakes or taking malicious actions. Manual testing shines in this arena because humans more easily understand how a site is put together and therefore can think of ways to take it apart. Very often these vulnerabilities are exploited by legitimate values (e.g. input validation is irrelevant) used in unexpected ways. Privilege escalation attacks and cross-site request forgery are two other types of vulnerabilities that fall squarely in the realm of manual testing to reliably and accurately identify.
How does a Web application scanning product adapt to new attack vectors? In that context, how does its development differ from that of other security software solutions?
Web application scanning products identify “instances” of a vulnerability. For example, SQL injection means a category of bugs that being able to inject SQL statement into some where of the web application. A web application scanning product will identify those instances based on the crawling results and also the input fields identified. And in this case, several fixed patterns can be used to identify the vulnerability. The development of such patterns is similar to vulnerability detection signature development.
Therefore, to develop an automated test for a new attack vectors, the development team will have to first understand the manual method to identify the vulnerability, then find more similar cases and create a generic pattern to identify the issue and come up with different pay loads and ways to identify matching results. Running such tests against web applications on different technologies and platforms to identify false positives and negatives. The general process is similar, however the challenge will be finding enough number of web applications and detect the results correctly. Because every web application can response to the requests differently and causing false positive or negative.
Has the popularity of Web 2.0 services increased overall security at the developer level? Do you see the builders of today’s web applications paying more attention to avoid introducing attack points?
No, the popularity or adoption of Web 2.0 services doesn’t increase overall security at the developer level. The security industry typically chases after evolving technologies rather than leading the way. In fact, two of the most well-known vulnerabilities, XSS and SQL injection, have been known and exploited since the mid-90’s when the web was just emerging. Pick a well-known site, Twitter, MySpace, Facebook, GMail, etc., each of these has suffered from at least one XSS vulnerability and in some cases several.
In many cases the trend towards more browser-heavy applications has introduced or re-introduced security problems. Now developers need to be more aware of using JSON and the xmlHttpRequest object securely to prevent attackers from injecting custom JavaScript or manipulating requests via cross-site request forgery. The idea of site mash-ups and third-party JavaScript (e.g. Facebook apps) introduces a whole new challenge of keeping a site secure while at the same time trying to work around the Same Origin Policy that underpins much of browser security.
The other side of this web evolution carries good news. Developers have more frameworks available to create desktop-like applications within the browser. Many of these frameworks have built-in security features or teams monitoring and updating the packages. It’s much easier to use these well-established building blocks rather than creating a site from scratch and risk deploying insecure or untested code. More and more good programming practices are filtering in to the JavaScript-heavy web sites of today.
User awareness and training always remains a fundamental challenge. The software world still struggles with buffer overflows that arise from programming mistakes identified decades ago. The web is no different. New developers arrive on the scene and make old mistakes, though perhaps in new and entertaining ways. It’s not easy to train every developer on web security. Nor is it easy to adopt a Secure Development Lifecycle like Microsoft or other companies have for traditional software. Web application development tends to follow a more aggressive development lifecycle that brings ever-increasing complexity to a site.
What are the most dangerous threats targeting Web application at the moment? What can be done to mitigate them?
The dangerous threats in terms of sheer numbers are the malware authors looking to infect sites with snippets of HTML that turn a site normally expected to be safe and trusted into a carrier for viruses and other malicious software. These are opportunists looking for the easiest way to infect vast numbers of web browsers.
Yet the relatively ancient (in web terms) SQL injection exploit remains a significant danger to web sites. Credit card numbers might be the most common type of information extracted by SQL injection attacks, but it’s hardly the only type. All information has value to someone. If attackers have a buyer, then everything from health information to gaming accounts to financial accounts will be targeted. SQL injection seems to be one of the best ways to obtain lots of information with minimal technological effort. It’s far easier to automate a SQL injection exploit than it is to craft a reliable heap overflow for the latest version of Windows or OS X.
Then there are the vulnerabilities lurking beneath the headlines: attacks against a site’s business logic or cross-site request forgery that targets a particular site. These can be harder to identify as they have very different fingerprints from more obviously malicious attacks like the dynamic duo of XSS and SQL injection. While the security community continues to push out new ways of exploiting XSS or reporting yet another SQL injection in a web application, it’s ill-advised to equate that narrow focus with the most pressing danger to a web site.
There are two easy steps to mitigate the easiest vulnerabilities for attackers to find. Implement robust validation filters for any data coming from the browser and build SQL queries with prepared statements. Every popular programming language for the web supports some way of applying those steps. Getting that security cruft out of the way enables the site’s developers to focus on the bigger picture of a secure development lifecycle and use tools like source code auditing, web application scanning and web application firewalls to mitigate risk from web applications already deployed.
Where do you see the current security threats your products are guarding against two years from now? What kind of threat evolution do you expect?
Pick a number greater than two. In that many years we’ll likely still be dealing with cross-site scripting. SQL injection has a good chance of remaining relevant for several years to come. The only positive aspect of this is that scanners tend to be pretty good at finding those types of vulnerabilities. One issue that seems to be on a downward slope is web server vulnerabilities. The days of worms like Code Red and Nimda have been replaced with worms running SQL injection attacks against the application itself or cross-site scripting worms running amok within a web site.
Web scanning is creeping into the realm of manual testing, albeit slowly. There should be increasing progress towards identifying vulnerabilities in the authorization and privilege escalation areas that are being missed today. Another trend in tools is to monitor sites for signs of comprise, looking for drive-by download attacks that serve malware from a site that’s already been exploited.