The Independent's journalism is supported by our readers. When you purchase through links on our site, we may earn commission.

US startup claims to have 'solved' CAPTCHAs in a breakthrough for AI

The ability to decipher the distorted words of CAPTCHAs shows an advanced ability to 'imagine' what the broken letters might be

James Vincent
Monday 28 October 2013 15:44 GMT
Comments

Vicarious, a San Francisco startup specialising in artificial intelligence, claims to have created a machine capable of cracking CAPTCHAs - the blocks of distorted text that are used online to “prove that you’re human”.

Although CAPTCHA tests might not appear to be the most sophisticated test of machine intelligence they are notoriously difficult for algorithms to decipher, with the most efficient way of bypassing them being to hire cheap manual labour and solve them by hand.

It’s been said that Google’s reCAPTCHA system (the most widely used on the internet) would be considered beaten if a computer could answer it correctly it just one per cent of the time. Vicarious say that their software solves reCAPTCHAs with a 90 per cent accuracy, and have uploaded a video showing the code in action (see below).

"We wanted to show we could take the first step toward a machine that works like a human brain, and that we are the best place in the world to do artificial intelligence research,” co-founder D. Scott Phoenix told Reuters, noting that the company does not intend to use its breakthrough for any nefarious means but that it represents a new approach to AI.

Vicarious claims that its methods are even more impressive than the cutting-edge “deep learning” displayed by current AI titans such as IBM’s Watson.

In 2011 Watson showed its capacity to understand natural language questions by competing on a special edition of US quiz show Jeopardy with past champions, but Vicarious claims that this sort of AI relies more on computational power and catalogued examples than actual ‘intelligence’.

Speaking to Forbes, Vicarious co-founder Dileep George claims that his company is working on “the math behind the processes of the brain”, and that they are working on systems that can “imagine” how to fill in blanks in vision just like humans can.

However, Viacrious are refusing to reveal any further technical details of their breakthrough and some experts are sceptical.

"CAPTCHAs have been around since 2000, and since 2003 there have been stories every six months claiming that computers can break them,” Luis von Ahn of Carnegie Mellon University, a co-inventor of CAPTCHAs and founder of reCAPTCHA, a tech start-up which sold to Google in 2009, told Reuters.

"Even if it happens with letters, CAPTCHAs will use something else, like pictures that only humans can identify against a distorting background,” said von Ahn.

Coincidentally, Google also announced an update to their reCAPTCHA system before the weekend, with the new changes designed “to learn how to better protect users from attackers”.

The updated CAPTCHAs uses numbers instead of text (to make things easier on the human eye) and “actively considers the user’s entire engagement with the CAPTCHA—before, during and after they interact with it” to better distinguish the bot from the human.

Whether Google’s announcement has anything to do with the news from Vicarious is completely unknown, but the search giant are promising they have “even more to report on in the next few months” with regards to the technology.

Vicarious attracted more than $15 million in funding last year from investors including Facebook co-founder Dustin Moskovitz and ex-PayPal CEO Peter Thiel’s Founders Fund.

Although the company does not plan to release any products for at least several years Moskovitz (who also sits on the Vicarious board of directors) released a statement that "we should be careful not to underestimate the significance of Vicarious crossing this milestone,” describing the company as “at the forefront of building the first truly intelligent machines."

reCAPTCHA is notable for its help in deciphering unreadable words from digitized books. It's perhaps one of the largest (and still relatively unheard of) crowd-sourcing efforts in the world.

CAPTCHA or reCAPTCHA: what’s the difference?

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” and is a trademark owned by Carnegie Mellon University.

The identity of the original inventors is disputed, but it’s certain that CAPTCHAs were developed in the late 90s to differentiate bots from genuine human users on websites. Bots might sign up for free email addresses and use them to distribute spam but CAPTCHAs stopped them.

reCAPTCHA is an evolution of the original system that was developed by the same team from Carnegie Mellon and acquired by Google in 2009.

The most notable advantage of the reCAPTCHA system is its help in digitizing books. reCAPTCHA asks users to identify two strings of numbers or letters – one of which is taken from scanned newsprint or book that OCR (optical character recognition) technology has failed to interpret; the other word or sequence of letters is already known to the software.

If the user correctly deciphers the known word, then the software assumes that they have succeeded with the unknown string and sends back the data to the project that supplied the image.

One scheme that reCAPTCHA is currently working on is the digitization of the archives of The New York Times. As of 2012, thirty years of the Times had been digitized with the project expected to be completed by the end of the year.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in