Thank you spammers… – Cypris' lookout

Table of Content

I’ve been tired of spam on my websites. The few hundreds messages spammers leave everyday are a bit of a nuisance.
Now though, I’ve decided to make them work harder to get their messages ignored.
Last week, reCAPTCHA came online. It’s an effort inspired by none other than Luis Von Ahn, so you know it’s good.

If you don’t know him then he’s the mastermind being similar projects that centre around a simple premise: make humans do the work that computers can’t do.
One of his on-going projects is the _ESP Game_ where 2 online players are trying to come up with a common description of a random picture. It’s apparently an addictive game and it helps solve a problem that computers are terrible at today: describing accurately the content of photographs.
Google is using his research to make image search more useful by returning content more relevant to your queries.

So, what’s the relevance with the CAPCTHA?
CAPTCHASs were invented to resolve a simple problem: stopping computers from automatically filling-in web forms to create accounts on popular free services that they would use to send spam from.
They are a visual version of the Turing Test, elaborated by the WWII genius cryptanalyst Alan Turing as a way to test how far machines could behave like humans: not knowing who she was interacting with, if a person could not tell the difference between a human and a machine, then the machine passes the test. It’s a measure of the success -or lack of- of artificial intelligence and the idea spawned many others, including CAPTCHAs.

CAPTCHAs simply require that small problem be solved before a web form can be submitted. Typical problems include blurry and distorted images of text or numbers that would be very hard for computers alone to decipher, but that our brain has no problem solving.
There are an estimated 60 millions of CAPTCHAs being solved by human beings every single day. That’s a huge amount of lost brain power as nothing really useful comes out of it (apart from preventing spam, of course).

reCAPTCHA‘s genius idea is to use that brain power to solve a problem that we would actually like computers to solve: digitizing books.
There are millions of books that were printed in the days before computers became ubiquitous, and there exist no electronic version of them except scanned images of their pages.
Optical Character Recognition software is getting very good, but when the scan is of poor quality or the book is old, many words cannot be automatically recognised.
Humans on the other hand are quite good at reading words, even if they are badly distorted and barely recognisable.
Instead of making up a distorted image that you would have to recognise, reCaptcha simply presents you with 2 words: one it knows and one it doesn’t and you’re asked to guess both.
Every unknown word is checked multiple times by different people and you thus end-up with a very accurate interpretation of the word that can be fed back into the electronic version of the book being scanned.

CAPTCHA do not entirely solve the problem of spamming, but they are an financial issue to spammers: automated electronic system cannot solve good CAPTCHAs, so some spammer rely on low-paid humans to do the dirty work for them.

It’s fine by me: poor people are getting paid to do something useful (help digitise books) and spammers are wasting their money doing so. In my case, they lose even more, because I use moderation to read comments before they are visible and Askimet to detect spam, which means that however hard they try, their spam never gets anywhere anyway.

In the fight against spammers, it makes me happy to know I’m costing them something for a change…

###References:###
* Breaking visual CAPTCHA
http://www.cs.sfu.ca/~mori/research/gimpy/
* Vulnerabilities of some CAPTCHA implementations (reCAPTCHA isn’t)
http://www.puremango.co.uk/cm_breaking_captcha_115.php
* Luis Von Ahn’s website
http://www.cs.cmu.edu/~biglou/
* Lecture on Human computation by Luis von Ahn (technical, but very inspiring)
http://video.google.com/videoplay?docid=-8246463980976635143
* The Internet Archive, benefiting from solving ReCAPTCHA, also a incommensurable source of free books
http://www.archive.org/details/texts

Comments

Carl J commented 18 years ago

Hey, great writeup. Can put up a post in a few weeks with your thoughts/findings on using ReCAPTCHA on your site? Like I said in my reply to your comment on my site, I’ve received about 200 comment spams in one article after I disabled the comments in it, so I’m wondering if this will help out in any way.

Author

Renaud commented 18 years ago

Hi Carl. thank you for stopping by. I must say that ReCAPTCHA solved my spam issues. As I said in the article, I still get a handful of spams through that get stopped by Askimet anyway but, without hard numbers, I would estimate that the ratio of spam getting though before and after ReCPATCHA is about 100 to 1; actually, it seems most spammers don’t even bother at all to try any more. Mind you though, my site is not very visited, but it used to be popular with spammers, for some reasons. Not any more, not any more…

Comments are closed.