Sunday, March 15, 2009

Thunderbird's spam & junkmail filtering

As my avid blog readers will be aware, I recent had to rebuild my mail server from scratch, and being the big job that it is I've been taking care of it in chunks.
SMTP first, Imap/pop second, content/spam filtering is still to come.

Postfix has sender address verification enabled, along with a few strict smtpd checks enabled and this works well to filter out the really obvious stuff from many zombies, but as expected with dspam out of the picture I was getting spam coming through to my inbox.

I use thunderbird on the desktop, and while working up the energy to have a good run at setting up proper spam filtering serverside, I decided to see if thunderbird's junkmail filtering system actually did anything useful. It was quick to setup and took minimal effort to train, so I figured it wouldn't hurt to give it a shot.

To give some context, I'm not getting a lot of spam hitting my inbox (compared to many), we're only talking about 30-40 per day. Enough to be annoying when you're used to getting only 1 every 2-3 days, as was the case with dspam in charge.

I enabled thunderbird's junkmail feature and started tagging spam by hand to train it. I configured it to move mail to a /junk folder, but I didn't mark it as read as I wanted to visually get a handle of how it was doing.

Fairly early on into the training I noticed that a lot around half of my spam had a common traight - it was all to and from my email address.
Since I only email myself (using the same to/from address pair) when testing something, this immediately lent itself to a very simple and obvious mail filtering rule:

If mail is
from: me@me.com and
to: me@me.com, then

1. Mark it is junk
2. Move it to the junk folder
3. Mark it as read

This very simple rule provided some automatic training data, and cleared my inbox of obvious junk without requiring any intelligence on the part of the junk-mail engine.

I've been Thunderbird's filtering for about a week now, and so far I've only had 1 false positive, and maybe 5 false negatives.

All things considered, and the very small subset of messages and short training periods are big factors, I'm very impressed with the performance. So much so that I'm wondering just how much work it's worth putting into a serverside anti-spam system.

Good job Mozilla!

If like me you assumed that a client side junkmail filter is likely something of a toy, I encourage you to actually give it a shot. I'm a convert!

No comments: