Mollom CAPTCHAs are "intelligent"

Every other week or so, someone asks me the following question: How are Mollom CAPTCHAs better than those created by CAPTCHA module?. This is an important question, and understanding it is central to understanding our philosophy with Mollom.

First, when using Mollom in "text analysis" mode, a CAPTCHA is only displayed when Mollom is uncertain about whether a message could be spam. Mollom analyzes the text of comments and combines that analysis with what it knows about the internal reputation of the posters, to determine whether a message is "spammy". Non-spam submissions are accepted without a CAPTCHA, and posts that are certainly spam are rejected automatically. By only presenting a CAPTCHA when necessary, we avoid penalizing normal (non-spamming) users with CAPTCHA challenges. The CAPTCHA module is different in that it does not perform text analysis and therefore must always display a CAPTCHA challenge.

Second, the Mollom module for Drupal has a "CAPTCHA only" mode, which is useful when clients would prefer not to use text analysis, or for when the forms have almost no text to analyze (like Drupal's user registration form). In "CAPTCHA only" mode, the user experience of the Mollom module is very similar to that of the CAPTCHA module -- the user is always prompted to complete a CAPTCHA in order to perform a certain operation. The similarity ends here, however. While the user experience is the same, the actual CAPTCHA generation is not. Mollom CAPTCHAs are "intelligent", in the sense that Mollom tracks the behavior and reputation of IP addresses from all sites using Mollom. A known spammer, operating from a known IP with a poor reputation, won't be able to complete a Mollom CAPTCHA no matter how hard he tries. And, as more users install Mollom, its performance increases as it learns from the additional data. A stand-alone module like CAPTCHA doesn't learn from user behavior, as it simply generates CAPTCHAs without regard to their context and delivery.

This second difference between the Mollom and other CAPTCHA modules is, in fact, huge. When we analyze our server logs, we see that 20% of all correctly completed CAPTCHAs are submitted by known spammers. Spammers don't seem to solve CAPTCHAs algorithmically; instead, they persuade humans to solve CAPTCHAs for them by using botnet infected machines. Two blog posts that detail this process are How to defeat Koobface and Breaking Koobface's CAPTCHA solving process. As spammers evolve and their arsenal of tools become increasingly powerful, CAPTCHA solutions must keep up to remain effective. We believe Mollom's "intelligent CAPTCHA" processing represents a significant benefit from traditional CAPTCHA generation and is one way we'll continue to stay a step ahead in our goal to eliminate posting spam.

Spammers set up shop at IBM.com

In this post we show one way spanners operate, using organizations like IBM to spam others. We show how spammers abuse an IBM wiki to create an online pharmacy via IBM.com. Once a page like this is in place, the spammers try to insert links to it in blog comments and community websites across the web to drive traffic to it, and to improve their search engine ranking.

While IBM is not using Mollom (maybe they should?), many of the targeted blogs and community websites are, which is how we discovered this problem. Watch the video below for more details. This problem is not specific to IBM. We found hundreds of similar spam pages on Google Sites, Yahoo, Amazon.com, Wordpress.com, Ning, Tripod.com, Xanga.com and more.

I recorded this video last week, and notified IBM about the existence of these spam pages several days ago.

New features for Drupal's Mollom module

It's been a year since our last significant changes to the Mollom module for Drupal, so we're excited to announce a major update. The most exciting new features in the new Mollom 1.11 release include:

  • Enables use of Mollom for any form; with Webform module integration almost completed
  • Includes an embedded audio CAPTCHA player that works on all browsers
  • Adds optional blacklist functionality to block spammers by text patterns and URLs
  • Adds an optional link to Mollom's privacy policy from forms protected via textual analysis
  • Adds support for serving CAPTCHAs over SSL (for Mollom Plus and Mollom Premium customers only)
  • Makes Mollom compatible with Pressflow
  • Integrates with CCK to change the position of the CAPTCHA field
  • Provides many more unit tests for continuous integration testing
  • Improves the APIs to enable module developers to better integrate with the Mollom module
  • Implements various usability improvements (e.g., better permission names and better error messages)
  • Updates several of the translations

We spent several months working on this release, and it marks our biggest upgrade to date. Upgrading requires some database updates, so remember to run update.php.

Thanks to Daniel F. Kudwien (sun), Dave Reid and Keith Smith for their contributions to the 1.11 release.

Mollom blacklisting and language detection APIs

If you're a regular reader of this blog, you know that Mollom is a continual work in progress. By studying how people use Mollom, by listening to feature requests, and by examining the plugins that our software partners and others have made available, we've introduced new ways to interact with Mollom.

First, we're announcing support for blacklisting. We introduced two new methods: one based on detecting the presence of user-specified URLs, and another that detects specific phrases or keywords. In both cases, Mollom maintains custom, site-specific URL and text blacklists, and knows to search for the presence of these links or phrases when analyzing text for your site. We're adding support for this API to the next version of the Mollom module for Drupal.

Second, we've implemented a new method that detects the language of any given text. We currently support detection of about 75 languages and this new functionality allows our end-users to take action based on posting language. It could be used to help segment web postings into different forums by language, or to help moderate the languages spoken on your site, for instance. The language detection API is used by some of our customers, but probably won't make it into the next version of the Mollom module for Drupal.

We've got other new features that we're working on as well, and will introduce them as they're ready. In the meantime, I'm excited to see what our plugin developers do with this new level of control.

Mollom 2009 retrospective

For Mollom, milestones came fast in 2009. First, we celebrated that Mollom had blocked 25 million spam messages. Two months later we celebrated that Mollom had blocked 50 million spam messages. Fast forward another 3 months, and Mollom blocked 100 million spam messages milestone. We ended the year with 163 million spam messages blocked (700% annual growth, up from 21 million in 2008) and 15,000 sites actively using Mollom (330% annual growth, up from 4,500 sites in 2008).

Not included in any of those statistics is the fact that Mollom partnered with Netlog in 2009, one of the fastest-growing web communities in Europe. Mollom is now protecting the messages of more than 40 million Netlog members, in more than 25 different languages. Each day, Netlog members exchange more than 4 million messages, all analyzed by Mollom for spam and unwanted content in real-time.

Last year, we spent a lot of time dealing with the pains of, frankly, our unexpected growth. We're handling well over 200 million HTTP requests each month, making Mollom the largest web service I've ever helped build -- a very fun and rewarding experience from the technology side. We launched additional servers and rewrote our backend infrastructure to improve scalability and ease of management. True success is measured by the fact that we had to purchase solid state disk drives (SSD) because we needed at least 100 times faster read and write times than regular hard disks could deliver. ;)

But best of all, on the business side, we were able to increase our investments while steering the company to profitability. That is a big win, because it proves that the business model works.

I predict that in 2010 we'll continue to do much of the same but that you'll also see some more "visible" changes -- maybe a new website, and almost certainly some new APIs and functionality to better combat spam. Blocking spam is a really hard problem, and spammers continue to adapt and refine their techniques. We have more work to do, but are committed to winning the spam game. But spam is only part of the problem in website moderation, this is why I expect that in 2010 Mollom will start providing solutions for different aspects such as language, content quality, profanity and malicious content.

Will we grow as fast in 2010? Only time will tell. We're a very small company, but Mollom has barely scratched the surface of its potential, so I have every reason to believe that 2010 will be another great year for Mollom.