How programmers can be tricked into running bad code
Are programming language package managers vulnerable to typosquatting attacks? And can these attacks result in software developers running potentially malicious code? The answer to both these questions is yes.
This was demonstrated by University of Hamburg student Nikolai Philipp Tschacher who, for his bachelor thesis, performed research that involved creating packages with names very similar to those of 214 popular packages, and uploading them to PyPi, npmjs.com, and rubygems.org, package repositories of the programming languages Python, Node.js and Ruby.
The experiment and the results
“166 of these package names were created algorithmically with edit distance algorithms which covered all possible typos of two chosen names. The rest of these package names were chosen according to (assumed) human propensity of misspelling package names,” he explained.
The test, performed between early November 2015 and late January 2016, resulted in his specially crafted packages being executed 45,334 times by 17,289 unique hosts, and around 50 percent of the confirmed installations were conducted with administrative rights.
Tschacher’s packages were not malicious and included software that would collect some info about the host (but not personalized data) and send it to a web server hosted in his university’s network.
This allowed him to discover on what type of endpoint the code was executed on (44% on Linux systems, 31.5% on Windows hosts, 24.2% on OS X computers, and the rest mostly on FreeBSD or Java operating systems).
Also, it showed that many of the hosts on which the packages were executed belonged to educational institutions, and some of the requests were from .gov and .mil domains.
This data collection was performed without permission from the owners of the “infected” hosts, but Tschacher argues that it was necessary.
“The real threat at hand, code execution on remote systems, cannot be shown if the empirical research is conducted without using the notification program,” he says. “By recording a successful installation with an open TCP connection to a university server and the sending of captured information, it becomes obvious that a malicious attacker could have easily installed malware instead. Therefore, the execution of code on foreign systems was regarded necessary to demonstrate the seriousness of the situation.”
The implications of the findings
People frequently commit typos when they install packages with the package manager client in their favorite programming language.
Tschacher didn’t find any malicious packages that misuse this fact in the wild, but found several that included alerts for users, warning them that they have mistyped the name of the wanted package, and to be careful about it in the future.
“It can be concluded that some people are aware of the typosquatting threat and that the simple idea behind it is well known,” he pointed out, adding that it’s likely that, sooner or later, criminals will exploit this method of infecting computers.
“Having infected hundreds of research institutions and various well known universities demonstrates how serious the consequences of typosquatting attacks are. Possible real life attacks could exploit Python’s proximity to the scientific community to intrude in networks with sensitive research institutions. The same applies to the private cooperations that make use of these programming languages. Therefore, one must see typosquatting as a kind of attack which is very easy to conduct in a not-targeted way, ” Tschacher noted.
“Possible attack targets could be in the industry or scientific laboratories. Even though typosquatting attacks are not targeted, if one waits long enough, the possibility grows that someone misspells a name and installs a typo package.”
Tschacher’s notification program in the benign typosquatting packages pointed users who ran them to an URL that explained the experiment, but a minuscule number of visits to the page showed that most users likely never noticed anything suspicious about the installation.
And that’s a problem.
Existing package repositories already implement some defense mechanisms to minimize the risk of this type of attack, and Tschacher has offered a few more, but software developers would also do well to be more careful about what code they run.