CAPTCHA : A story of old books, traffic lights and self driving cars
Before we talk more about CAPTCHA, we need to talk quickly about Turing Tests. Long before Artifical Intelligence was the cool thing to work on, Alan Turing the great british mathematician deviced a test to tell a computer apart from a human. As part of the Turing Test, a human interogator asks the same question to another human and a computer. The interogator does not know which one is the human and which is the computer. Based on the responses from the human vs the computer, the interogator has to guess which response is from the human. If the computer is able to fool the interogator into beleiving that it is the human for more than 50% of the responses then it is considered that the machine has acheived AI. So, considering that we are in a world where we have not acheived AI which can match human intelligence, a machine or computer would fail at a task that humans can do easily. CAPTCHA uses this philosophy at it’s core.
CAPTCHA is a software test that validates if the subject executing it is a human being or a machine. A test pass corresponds to a successful authentication for a human being. A test fail is an authentication failure and prevents the machine from exploiting a protected resource such as account sign ups, server time etc.
In the early 2000s when I was a college student, I remember the websites used to be filled with spam comments of all kinds. These were the days when blogging was the cool thing to do and Orkut was the social media of choice. There was no facebook yet. Rogue programs could be written to spam blogs with comments, perform numerous sign ups and in the worst case engage in unsophisticated Denial of Service attacks bringing down entire websites. Website or any front end interface is meant for human use. But when machines can mimic humans actions, it can be exploited for nefarious purposes. Using a free automation tool like Selenium Webdriver one can easily build a script in 5 mins to create actions on a web client.
CAPTCHA prevented spam attacks in a novel way. Present a squiggly image of a word or a phrase to the subject and challenge them to type it out. The task is simple for a human but near impossible for a machine. Thats sublime and brilliant! Completely Automated Public Turing Test To Tell Computers and Humans Apart. CAPTCHA. Luis Von Ahn and his team coined this term in 2000 while at Carnegie Mellon University. Von Ahn turned out to be a pioneer of crowdsourcing software and went on to found the language learning platform Duolingo.
The Captcha technology was used to translate and digitize thousands of old books in a very smart way. A standard CAPTCHA challenge was accompanied with a picture of a word or phrase from an old book and presented to users. This way thousands of old books were digitized by Google. And we helped do that unknowingly. That’s a great use of hundreds of thousands of hours of human effort that was wasted in solving CAPTCHA challenges. This technology is called re-CAPTCHA and it is owned by Google.
Traffic lights and Self driving Cars
The latest CAPTCHA challenges that we see these days ask us to identify traffic lights and cars in an urban image. Now why always traffic lights and cars?
Turns out Waymo, google’s self driving car company is training it’s AI models to better identify traffic lights and cars on the road. And like digitizing books in the past, we are helping Google make self driving cars a reality in the future.
No CAPTCHA re-CAPTCHA
Another type of CAPTCHA that Google has been using is the no captcha. Google knows our online behavior and based on that it generates a score between 0.0 to 1.0 that tells the likelihood of the user being a bot. As a developer, depending on the score returned, you can decide whether to present a challenge or not.
The “I am not a robot” checkbox is a variation of the no-captcha as well in which Google asks you to click on a checkbox as a challenge. Google uses your move movement attributes to determine if you are a machine or a human.
Breaking CAPTCHA
There are shops with humans sitting in front of computers to perform fake sign ups, likes, reviews etc. It can be a challenge for technologies to catch these because essentially these are not computers but humans interacting. The artificial intelligence and image recognition technology have seen amazing advancements recently which has made some of the traditional CAPTCHA ineffective or vulnerable. The challenge presented would keep changing as modern machine become more and more efficient at imitating humans.