Google has confirmed that it has bought ReCaptcha, which specialises in preventing web fraud.
The Carnegie Mellon University spinoff develops software that implants distorted text on web pages as part of the sign-in process. This stops computers automatically setting up bogus accounts by being very difficult to scan and interpret without human intervention.
Google is looking to use the technology for this purpose on some of its web services products, but is also interested because it could help with its plans to digitise the world's libraries.
"Since computers have trouble reading squiggly words like these, captchas are designed to allow humans in but prevent malicious programs from scalping tickets or obtaining millions of email accounts for spamming," said Will Cathcart, Google product manager. in a blog post.
"But there's a twist: the words in many of the captchas provided by ReCaptcha come from scanned archival newspapers and old books. Computers find it hard to recognise these words because the ink and paper have degraded over time, but by typing them in as a captcha, crowds teach computers to read the scanned text."
Google's plans to digitise the world's books, while contentious with publishers, also faces significant technical challenges. While optical character recognition (OCR) technology has come a long way since the 1990s it still has problems, particularly with outdated fonts and faded paper.
The company will now integrate ReCaptcha's technology into its OCR engines. The financial terms of the deal were not revealed.