CAPTCHA the Internet
Posted by Tom on February 21st, 2006
CAPTCHA (an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart”) has been on my mind ever since Phil Windley suggested a graphical CAPTCHA would make a good web service. I thought there might be those willing to pay to use it. Well, it’s been done.
There is a need for this type of test. Yahoo! and Hotmail use a CAPTCHA to stave off spammers when a user requests an email account. I suspect the most common use on other sites is an attempt block automated comment spam in blogs.
CAPTCHA excludes legitimate users
As the W3C points out graphical CAPTCHAs are a significant barrier to low-vision and blind users. Those with learning disabilities, such as dyslexia, may also be adversely affected. As visual CAPTCHAs become more sophisticated, busy, patterned background becomes more of an issue for color-blind users.
The U.S. Census Bureau estimated that in 1997 about 7.7 million Americans had difficulty seeing the words and letters in an ordinary newspaper. The American Foundation for the blind reported about 5 in 1,000 Americans are legally blind, and gives a low estimate of 1.5 million visually impaired computer users. That’s a fairly significant potential market to ignore.
Requiring users to interpret a visual CAPTCHA may lead to legal challenges. Earlier this month, the National Federation for the Blind filed suit against Target, claiming target.com discriminates by not being accessible to visually impaired users.
Audio CAPTCHA
Some companies are experimenting with audio CAPTCHAs, spelling out random letters with random noise in the background. However, aural disabilities are more common than visual ones, so the approach isn’t really more accessible. Speech recognition software is more advanced than character recognition, so the purported purpose of differentiating between humans and computers is not filled anyway.
CAPTCHA is broken
Several projects to crack common visual CAPTCHA algorithms, particularly The CAPTCHA Project (by the Carnegie Mellon School of Computer Science), the UC Berkeley Computer Vision Group, and Sam Hocevar’s PWNtcha, have had good success. Howard Yeend demonstrated a vulnerability in several public algorithms where he could reuse a solution several thousand times after manually solving it once.
Social engineering is often easier than fancy programming. The first widely recognized social engineering solution was “borrowing” CAPTCHAs from target sites and showing them at entry points to porn sites. Visitors to porn sites would solve the CAPTCHAs, allowing spammers to get essentially free labor. Amazon’s Mechanical Turk (tagline: “Artificial Artificial Intelligence”), which gives micro-payments for simple tasks is an example of another way CAPTCHAs could be defeated. Even at a few cents per image, the cost may still be too high for spammers, but it is a demonstration that the process can be outsourced. After all, the world is flat.
What is the underlying purpose?
The real reason for CAPTCHA is to screen undesirables. For low traffic sites, it means preventing automated access. This can be accomplished in a relatively simple way: add a single required question to the comment submit form. Something like “What color was George Washington’s white horse?” or “Enter the fourth word in this sentence.” This is enough to make the form non-standard, thus unusable by generic bots. Bypassing this added security would be very easy for spammers, the advantage is the relative obscurity of most blogs. To target multiple blogs, a spammer would need to address each one individually; individual attention is unlikely, so I suggest this method is the easiest for bloggers with a knowledge of web programming, and is as accessible as a comment form without a CAPTCHA.
Major sites like Yahoo! and Google have a bigger problem. After all, they are targets both because of the value of their services, and their size. When it first launched Gmail, Google limited accounts to those who had been invited by other active users. Initially there was a good bit of commotion in the tech community as gmail.com addresses became a sign of prestige. The invitation system allows Google to track which users may be abusing the service, and which users invited the abusers. Google has gone a step further, and now allows potential users to have an invitation code sent to their mobile phones. The number of accounts requested per phone number can be tracked. The potential gain from a limited handful of throw-away email accounts, and the cost of mobile phones (even disposable ones) is enough to deter spammers, because less troublesome alternatives exist.
If you look at Google’s account request page, you’ll see a CAPTCHA there. Google responsibly offers a way for users with disabilities to bypass the CAPTCHA, although it involves human-to-human interaction (and quite a bit more time) to complete—a costly alternative.
Real solutions
Several solutions to the problems with CAPTCHA have been proposed and debated. Most have major cost or accessibility problems.
It would seem the only good solution is some sort of federated identity system, which is really just offloading the trouble of user validation to someone else.

Good article on the present state of Captcha’s. Just reaffirms that there are no new ideas, and that Google makes that harsh reality even more obvious.
Phil’s blog was in response to a project I had built for doing captcha within a jsp tag library. I’ve since given up on the captcha project and removed it from my site due to the fact that the idea is patented (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/search-bool.html&r=1&f=G&l=50&co1=AND&d=ptxt&s1=6195698.WKU.&OS=PN/6195698&RS=PN/6195698).
An interesting tidbit I found that other captcha enthusiasts may be interested in knowing is that although the patent is currently registered to HP (via its acquisition of Compaq), HP reports that it has since sold the patent to an undisclosed recipient. They forwarded my inquiries to that recipient, and I got nothing back. Probably a greedy patent abusing company gathering data in preparation for a spat of lawsuits.
Nathan Sandland
3600 Degrees
Left by Nathan Sandland on February 22nd, 2006