The Spam War

As things on the internet go, not many are worse than spammers. At best they're nuisance, at worst the hardware, cycles and power wasted on dealing with them is causing global warming and pollution at an ever accelerating rate.

I'm most annoyed by the wasted cycles. When I run a web site, I'd like to use the CPU power I pay for to be used to serve (admittedly mediocre) content to people that might get some (questionable) value out of them.

I use the Mollom service to check all comments posted to this blog for spam content, so some of the detection is outsourced, but when I looked at the server logs I found that most spammers - be they bots or humans that would fail a turing test - try to post spam from the same IP address as many as 15 times in a relatively short period. And that got me thinking.

To help stem the tide of that other great source of spam - email - I use a utility called fail2ban. It looks at my mail server log files and checks for log messages that indicate an email was classified as spam. If it finds such a message, it extracts the spammers IP address and adds it to the firewall for an hour.

Why not, asked my brain, do this with the Drupal logs? Why not, indeed! Blocking spam at the firewall level is nice and cheap, as it requires no requests to the web server or any PHP code. All the spammer notices is that they can suddenly no longer connect to the website.

Oi, get on with it...

Configure your logDrupal can be configured to send all its log messages to the Linux system log. All you need to do is turn on the syslog module, which is part of Drupal core. 

When done, visit the Configuration » Development » Logging and errors page and check that you like the settings. There are two settings here that affect how your setup will work.

First, the Syslog facility option allows you to filter the Drupal logs to a specific log file via the system log. That's helpful for giving access permissions and also to keep the log directory tidy. Unless you're already using the LOCAL0 facility for some other service, you can leave that unchanged.

Secondly, the Syslog format option specifies how log messages are written to disk. You may or may not want to log all the data that's currently specified, for privacy reasons. However, you will need at least !ip and !message to make fail2ban work. I just use the default.

As soon as you turned on the syslog module, Drupal started sending all its watchdog logs to the system log daemon, but since you didn't tell it what to do with those messages, they are getting written to the standard catch-all log file, which is /var/log/syslog on Ubuntu or Debian. To fix that, you need to tell your system logger to write all LOCAL0 messages to a separate file by dropping the attached rsyslog.drupal.conf snippet into the /etc/rsyslog.d/ directory on your system or adding the following lines to /etc/rsyslog.conf

local0.crit;local0.err       -/var/log/drupal/drupal.err
local0.info                  -/var/log/drupal/drupal.info
local0.*                     -/var/log/drupal/drupal.log

You need to restart the system logger service for this change to take effect.

$ sudo service rsyslogd restart

You should now have a /var/log/drupal/ directory with three files in it. The separate files for messages with crit(ical), err(or) and info severity aren't strictly necessary, but are handy to have in case of WSOD.

Logging done, now find the spammer

Next, you will need to install fail2ban and tell it to keep an eye on your Drupal log and do something when it finds a spammer. In my case, it finds a spammer when Mollom classifies a comment as spam and it does it by looking for specific information in /var/log/drupal/drupal.log via a regular expression.

You can drop the attached drupal-mollom-spam.conf into /etc/fail2ban/filter.d/ to accomplish that. A typical caught spam notification starts like this:

Feb 22 22:59:01 cappucino drupal: http://cafuego.net|1361573941|mollom|199.83.95.106|
http://cafuego.net/comment/reply/425|http://cafuego.net/2012/08/14/date-ordinals-ugly-solution-ugly-problem|
0||Spam: Twerciffigide    Encumesoceege   h...

The regular expression fail2ban uses to both match such a log message and extract the spammers IP address is:

\|mollom\|<HOST>\|.*\|Spam:

SpamwallNext up, fail2ban needs a bit of configuration to tell it to in fact use that filter snippet and what to do if it finds a match. Such a configuration is called a jail and you can find mine attached as drupal-mollom-spam.jail. Add its contents to your /etc/fail2ban/jail.conf file and customise it as required. It's pretty important that you set the ignoreip to your own IP address. That will stop fail2ban firewalling you out of your website should you decide to post a spam comment.

The bantime setting specifies the number of seconds the spammers IP address should remain firewalled and mine is set to a day. The other setting to note is maxretry. That is the number of allowed retries (to post a comment classified as spam) before the IP address is added to the firewall. By setting it to 1, humans can solve a captcha and still post their comment. Bots can't and get banned.

You're now all set to go, restart fail2ban and keep an eye on its logs:

$ sudo service fail2ban restart
$ tail -f /var/log/fail2ban.log
2013-02-22 22:27:49,770 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 175.44.5.35
2013-02-22 22:30:59,079 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 5.254.146.52
2013-02-22 22:34:03,360 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 198.143.133.166
2013-02-22 22:55:14,118 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 144.76.6.218
2013-02-22 22:55:43,181 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 199.180.113.6
2013-02-22 22:59:02,477 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 199.83.95.106
2013-02-22 23:08:41,273 fail2ban.actions: WARNING [drupal-mollom-spam] Ban 195.190.13.169

Look at that, all the spammers going right into the firewall, excellent! Use ctrl-c to quit tail when you get bored watching software firewall other software :-)

Tidying up

Finally, there is a little bit of configuration you should do to keep the /var/log/drupal directory tidy and prevent it from filling your servers disk. The logrotate utility should be installed already, but you can add easily enough if that's not the case.

To tell it to compress and archive your new Drupal logs each day and keep at most 2 weeks worth of each, drop the logrotate.drupal.conf file into your /etc/logrotate.d/ directory.

And that's it. You're done! Spam won't magically stop happening, but your system should now suffer from a reduced load of spam attempts, leaving more resources for it to do what it was intended to.

Update

This (and a few other) rules are now included in the 7.x-2.x version of the fail2ban Drupal module.

Comments

Hey thank you, it really helped me. My site has a huge amount of spam attempts, and your solution works. I hope it'll reduce spam :)