Researcher publishes 10 million usernames and passwords to aid future research
Independent IT security analyst Mark Burnett has released a cleaned up cache of 10 million username and password combinations, in order to give researchers a data set that can be analyzed and from which insights into user behavior can be gleaned and used to improve authentication practices.
Years ago, this move would have been low-to-no risk, but after the Barrett Brown case and the changes to the Computer Fraud and Abuse act proposed recently by the White House, Burnett felt that this release should be accompanied by an unequivocal disclaimer.
“Recent events have made me question the prudence of releasing this information, even for research purposes. The arrest and aggressive prosecution of Barrett Brown had a marked chilling effect on both journalists and security researchers. Suddenly even linking to data was an excuse to get raided by the FBI and potentially face serious charges. Even more concerning is that Brown linked to data that was already public and others had already linked to,” he explained in a blog post.
“Although researchers typically only release passwords, I am releasing usernames with the passwords. Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone,” he noted, and added that the sole intent of this release “is to further research with the goal of making authentication more secure and therefore protect from fraud and unauthorized access.”
To be on the safe side, the released cache has been cleaned up: the domain portion from email addresses has been removed, and so were keywords that might indicate the source of the login information. Information that might be linked to an individual, or that might be a credit card or financial account number has also been taken out, as were accounts belonging to employees of government or military sources.
The passwords included are mostly “dead”: too old (date back 5-10 years), were previously published online and have therefore been likely misused and reset by now. “Ultimately, to the best of my knowledge these passwords are no longer be valid and I have taken extraordinary measures to make this data ineffective in targeting particular users or organizations,” Burnett commented.
“Having said all that, I think this is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution or legal harassment. I had wanted to write an article about the data itself but I will have to do that later because I had to write this lame thing trying to convince the FBI not to raid me,” he concluded.
The data set is available for download via a link from his post.