Server-side Bogofilter with Dovecot, the simple way

By on 13 janvier 2013, in Boulot, Documentation, Informatique, Journal, Libre Software, Licorn®, LXC, System administration

Assumed I already have a local dovecot setup with IMAP and Maildirs, I just wanted to add a local SPAM filter to avoid filtering them manually. I should already be doing it for ages, but my recent configuration / hosting changes disabled the whole thing again, and I had been quite lazy about putting it back in place. I’ve got no local postfix setup at all, because:

  • all incoming mail goes via the simple getmail program which pushes mails straight to dovecot‘s deliver,
  • and outgoing mail goes directly from the local machine to my central server via nullmailer. Only the central server has postfix running and fully setup. Desktop clients have roughly the same outgoing configuration than the postfix machine, via my hosting provider servers.

I’m on Ubuntu Server 12.04 LTS, thus I voluntarily don’t take time to detail the obvious sudo apt-get install bogofilter command on the local host.

From my user account point of view, the enhanced getmail configuration (taken from here) looks like:

type     = SimplePOP3SSLRetriever
server   =
username = xxxxxxxxx
password = xxxxxxxxx

type = MDA_external
path = /usr/bin/bogodeliver

And the /usr/bin/bogodeliver shell script is bare simple:

# Deliver mails using Dovecot LDA
# by first passing it through Bogofilter 

/usr/bin/bogofilter -u -e -p | /usr/lib/dovecot/deliver "$@"

There are 2 remaining configuration points:

  • the first is done on the client side, via the sieve protocol/extension. In my current setup, I simply add 2 rules in roundcube webmail, one for sure SPAM (eg. X-Bogosity == Yes mails go to the Junk, mailbox) and one for unsure (eg. X-Bogosity == Unsure go to the Unsure mailbox). This way (via sieve), the rules are enabled globaly for all my clients, on the server-side. Brain-relaxing!
  • The second is the bogofilter trainer script (SH, ~550b), which only a few lines of shell. It will iterate all my mailboxes, treating and learning the contents as SPAM or HAM, given the malbox names. It is planned via cron once a day (but once a week could suffice if you don’t receive that much mail).

With this setup, I can do whatever i need to classify my mails on my clients, without taking care of the clients individual SPAM configuration or actual knowledge level. I currently use 3 instances of Apple Mail, 3 Mozilla Thunderbird too on different machines, and 2 different webmails, all via IMAP. Maintaining all of them with the good level of SPAM knowledge would be painful.

The daily trainer will always catch new SPAMs and historic HAM whatever the clients, and do it once and for all.