How to create effective CAPTCHAs
Half a year ago, a team of researchers from Stanford University’s Security Laboratory has managed to build a computer program able to solve audio CAPTCHAs and to define how future audio CAPTCHAs should sound like in order to avoid being cracked by computers.
Since then, the same team has concentrated on making that same software (“Decaptcha”) able to break currently used text-based CAPTCHAs in order to reveal their strengths and weaknesses and learn from them.
Real-world CAPTCHAs differ one from another in many ways, as one can see from the types used by 15 popular websites against whose CAPTCHAs the researchers tested their tool (click on the screenshot to enlarge it):
All of them use a combination of anti-recognition and anti-segmentation techniques that are aimed at confusing bots and prevent automatic solving. Among the first are the use of multiple fonts, different charset schemes and font sizes, character blurring and rotating, and distortion of CAPTCHAs. In the second category falls the use of complex backgrounds, the positioning of lines across the characters and the removing of space between them.
Of the 15 CAPTCHAs tested, only 2 resisted the attack: Google’s and reCAPTCHA’s (reCAPTCHA is owned by Google). Of the remaining 13, some where easier (Authorize, Blizzard, Captcha.net, Megaupload, NIH) to solve than others (Baidu, Skyrock).
The testing has allowed them to conclude that successful automated attacks are more difficult to achieve it the length of the CAPTCHAs and the size of the characters is randomized and if wave shaping, collapsing or overlayed lines are used.
Some of these methods are effective only when used together or in a specific way. For example, collapsing works properly only if the size and the number of characters are random and lines must cross multiple characters.
Alternative CAPTCHA schemes that can be rolled out in case of a break are also a good idea.
For more details about the research and its results, download the paper here.