Researchers open source tools to identify Twitter bots at scale
Duo Security published technical research and methodology detailing how to identify automated Twitter accounts, known as bots, at a mass scale. Using machine learning algorithms to identify bot accounts across their dataset, Duo Labs researchers also unraveled a sophisticated cryptocurrency scam botnet consisting of at least 15,000 bots, and identified tactics used by malicious bots to appear legitimate and avoid detection, among other findings.
The research
From May to July 2018, researchers collected and analyzed 88 million public Twitter accounts comprising more than half-a-billion tweets – one of the largest random datasets of Twitter accounts studied to date.
Duo’s dataset is built from information collected through the publicly available Twitter API, and includes profile screen name, tweet count, followers/following counts, avatar and bio. The content of tweets and social network connections for accounts were also gathered as platform API limits allowed.
Highlights of the research include:
- New open-source tools and techniques that can be used to discover and unravel large-scale botnets.
- Analysis of one of the largest random Twitter data sets to-date, including the application of 20 unique account characteristics in a machine learning model to differentiate a human Twitter account, classified as “genuine” in the study, from a bot. These characteristics include, among others, the time between tweets, distinct tweet sources and the average number of hours per day an account is active.
- Discovery and details of a sophisticated cryptocurrency scam botnet, consisting of at least 15,000 bots, including how it siphons money from unsuspecting users by spoofing cryptocurrency exchanges, celebrities, news organizations, verified accounts and more. Accounts in the cryptocurrency scam botnet were programmed to deploy deceptive behaviors in an attempt to appear genuine and evade automatic detection.
- Mapping of the cryptocurrency scam botnet’s three-tiered, hierarchical structure, consisting of scam publishing bots, “hub” accounts that other bots often followed and amplification bots that like tweets in order to artificially inflate the tweet’s popularity and make the scam link appear legitimate.
Duo researchers actively observed Twitter suspending cryptocurrency scam bots, as well as quickly identifying verified accounts that had been hijacked, returning them to their rightful owners. Despite ongoing efforts, portions of the studied cryptocurrency botnet remain active.
“Users are likely to trust a tweet more or less depending on how many times it’s been retweeted or liked. Those behind this particular botnet know this, and have designed it to exploit this very tendency,” said Data Scientist Olabode Anise.
“The bots’ attempts to thwart detection demonstrate the importance of analyzing an account holistically, including the metadata around the content. For example, bot accounts will typically tweet in short bursts, causing the average time between tweets to be very low. Documenting these patterns of behavior can also be used to identify other malicious and spam botnets.”
Twitter’s response
In response to the research, which was shared with Twitter prior to publishing, a Twitter spokesperson said that the company is aware of this form of manipulation and is proactively implementing a number of detections to prevent these types of accounts from engaging with others in a deceptive manner.
“Spam and certain forms of automation are against Twitter’s rules. In many cases, spammy content is hidden on Twitter on the basis of automated detections. When spammy content is hidden on Twitter from areas like search and conversations, that may not affect its availability via the API. This means certain types of spam may be visible via Twitter’s API even if it is not visible on Twitter itself. Less than 5% of Twitter accounts are spam-related,” the spokesperson added.
“Malicious bot detection and prevention is a cat-and-mouse game,” said Duo Principal R&D Engineer Jordan Wright. “We anticipate that enlisting the help of the research community will enable discovery of new and improving techniques for tracking bots. However, this is a more complex problem than many realize, and as our paper shows, there is still work to be done.”
Wright and Anise will present their research on Wednesday at the 2018 Black Hat USA security conference in Las Vegas. Following the presentation, they will make their research tools available on Github to enable other researchers to identify automated Twitter accounts at scale.