What is reCAPTCHA? reCAPTCHA Business Model

While using websites like Twitter, Facebook, Craigslist or even TicketMaster, users have come across instances where they are asked to enter a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) of text which is not easily distinguishable. This CAPTCHA is a way of proving if the user is a human or a bot. CAPTCHAs have kept malicious programs from sending spams at bay by preventing it completely.

What is reCAPTCHA?

reCAPTCHA was designed by Luis von Ahn, Manuel Blum, Colin McMillen, Ben Maurer and  David Abraham at Carnegie Mellon University’s main Pittsburgh campus. Created to establish if the user is a human or not, it took the internet by storm by aiding in the digitalization of books. Google acquired reCAPTCHA in September 2009 and have been using it as an important feature of its Google Books project where rare, ancient and out-of-print books are digitalized and offered to the public.

With the use of reCAPTCHA, humans have digitalized over 13 million archived articles of The New York Times from the past 20 years or so in just a few months. Through mass collaboration, books that are impossible to be scanned by computers are digitalized as well as translated into different languages. The CAPTCHA words are distorted further by reCAPTCHA in order to suppress the chances of another cyber-attack program solving the CAPTCHAs.

 “this project is 99.1% comparable to the best human professional transcription services”, claims the Cylab institute of Carnegie Mellon University.

How Does reCAPTCHA Work?

With the unselfish applications of reCAPTCHA’s technology and data, it has a very interesting business model. ReCAPTCHA charges the companies for using their verification. Each and every word is shown in reCAPTCHA is a scanned word from one of the millions of texts from this world. After the books are scanned, the text would be analyzed by two different optical character recognition (OCR) programs. A standard string-matching algorithm is used to compare the results from both the programs as well as with a dictionary. If there are any words not readable by the OCR programs or if they are deciphered differently, those words are converted into a CAPTCHA so that any human can solve it. Every suspicious word is paired with a word already deciphered called the control word and they would be shown on the screen. If human types the control word correctly, the response to the suspicious word is tagged as probably valid. When 3 distinct humans type the same control word correctly, the suspicious word is proven to be deciphered completely.

This validation of words is done using a point scale where 0.5 points are given when the word is identified by each OCR program and 1.0 points are given when it is identified by a human. When a word gets 2.5 points, it is considered to be a valid word. The words that are consistently validated by humans are used as control words. If a word is getting wrong 6 times, then it is designated as unreadable. The two words are shown separately in the original reCAPTCHA as out-of-context words, rather than from the same original document in order to avoid confusion among the words. There are instances when the control word appears to mislead the second word like for example if the two words given are ‘metal’ and ‘wand’, people would usually type ‘metal band’ as the word band is used more often with the word metal.

reCAPTCHA offers a plug-in for applications like Ruby, PHP and ASP.NET etc to ease the implementation of their services. A JavaScript API having a callback server to reCAPTCHA is used to supply the words for the CAPTCHAs. reCAPTCHA provides libraries for various programming languages and applications to make this process easier. Though the CAPTCHA images are obtained by different websites free of cost in exchange for their help in deciphering the texts, reCAPTCHA is not open-source software.

Photographs of house numbers taken from Google’s Street View project in 2012 were utilized by reCAPTCHA to digitalize it in addition to digitalized texts. reCAPTCHA implemented behavioural analysis in 2013 by presenting more difficult CAPTCHAs. But, by 2014 this was removed from the Google services and another system where people are made to select a few images from a set of nine images was introduced.  In 2017, reCAPTCHA enhanced its mechanism to need no user interaction called ‘invisible reCAPTCHA’.

“Invisible reCAPTCHA creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human.”, says former Google click fraud czar Shuman Ghosemajumder.

reCAPTCHA has demonstrated the importance and wonders of hidden crowdsourcing by using people to get the work done without them even realizing the impact they are creating on the internet. Since it doesn’t need any additional effort on the part of the people, the effectiveness of this project is to the maximum.

Go On, Tell Us What You Think!

Did we miss something?  Come on! Tell us what you think about our article on reCAPTCHA Business Model | What is reCAPTCHA? in the comments section.