Popular coding advice doesn’t necessarily equal secure coding advice
Stack Overflow is a hugely popular online forum/Q&A site that many programmers and software developers use to find answers to particular programming problems.
Unfortunately, researchers recently found that a considerable portion of the information/code provided by many contributors contains exploitable security vulnerabilities. And since less knowledgeable users are unlikely to spot those, the question is: can they rely on the site’s user community to help them differentiate secure from insecure choices?
According to more recent research by a group of researchers from Virginia Tech, TU Munich and the University of Texas at San Antonio, the answer is “no.”
Research findings
The researchers conducted a study on security-related Stack Overflow posts and contrasted secure and insecure advice with the community-given content evaluation. To ensure a fair comparison between secure and insecure suggestions, they focused on the discussion threads related to Java security.
“We compiled 953 different groups of similar security-related code examples and labeled their security, identifying 785 secure answer posts and 644 insecure answer posts,” they explained.
“Compared with secure suggestions, insecure ones had higher view counts (36,508 vs. 18,713), received a higher score (14 vs. 5), and had significantly more duplicates (3.8 vs. 3.0) on average. 34% of the posts provided by highly reputable so-called trusted users were insecure.”
The results of the research made it obvious that the site’s voting system fails to identify and reward secure answers.
Also, its reputation mechanism fails to point out trustworthy users with respect to security questions.
The users who provided secure answers have a significantly higher reputation than the providers of insecure answers, but the difference in magnitude is negligible, the researchers noted, so users can’t rely on the reputation mechanism to identify secure answers.
Additional findings findings include:
- Accepted answers and snippet repetitiveness are also not a reliable way for users to identify secure coding suggestions.
- Insecure answers dominate in the SSL/TLS category (70%). Secure answers dominate the other categories (94% in Asymmetric, 71% in Hash, 54% in Symmetric, 52% in Random).
- Duplicated answers were created because users asked similar or related questions and some users blindly copied and pasted code to answer more questions and earn points. The good news is that researchers didn’t identify any user that intentionally misled people by posting insecure answers.
Recommendations for improvement
“It is worrisome to learn that SO users cannot rely on either the reputation mechanism or voting system to infer an answer’s security property,” the researchers noted, and pointed out that a recent Meta Exchange discussion thread showed the frustration of Stack Overflow developers to keep outdated security answers up to date.
Their advice for tool builders is to test the code and explore approaches to detect and fix security bugs, preferably in an semi-automated or automated way, and for Stack Overflow developers to:
- Integrate static checkers to scan existing posts and posts under submission
- Automatically add warning messages or special tags to any post that has vulnerable code
- Encourage moderators or trusted users to exploit clone detection technologies to detect and remove both duplicated questions and answers.
- Switch from a using a single reputation score for each user to using one score for each tag reflecting frequently asked/answered questions, so that their expertise can be better characterized.
Stack Overflow’s gamification approach for incentivizing users is also ineffective when it comes to improving the security properties of distributed code examples. In fact, since answering more questions leads to improved reputation, contributors are effectively encouraged to provide duplicated, less useful, or insecure coding suggestions.