Bayes Filtering in SpamAssassin

The Bayesian classifier in SpamAssassin began tagging emails a few days ago. I found this out because while messages were not marked as spam, my procmail rule started diverting all messages to my spam folder. The old rule was not particular about where the yes was and since BAYES contains yes, all emails looked like spam. The new rule only looks for the yes at the beginning.

# Old Rule
:H
* ^X-Spam-Status:.*Yes
$MAIL/spam
# New Rule
:H
* ^X-Spam-Status: Yes
$MAIL/spam

Now incoming spam messages contain an additional score in the spam report.

X-Spam-Report:
        *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
        *      [score: 1.0000]

I was surprised that it took the Bayes filter three months to gather enough email to begin scoring incoming email. It is a nice addition because it bumps up the spam scores enough to ensure that more messages that are spam get marked as such.

Configure SpamAssassin with Postfix on Ubuntu

I’ve been running a mail server for the last year and a half. When I initially set up my Postfix mail server on Ubuntu, I knew that eventually I would need to add a spam filter. I recently decided that SpamAssassin was the best choice to filter email on my mail server.

I now receive on average more than one spam message each day. Interestingly, all of my spam is sent to an email address that I have only given out to Marquette University. I guess that means they have either sold my email address or poorly secured it in their database. Neither would surprise me.

I used the content from two different tutorials to get SpamAssassin up and running on my server.

First, I installed SpamAssassin.

apt-get install spamassassin spamc

Next, I created the spamd user and group. You can specify a specific uid and gid if you want.

groupadd spamd
useradd -g spamd -s /bin/false -d /var/log/spamassassin spamd

Then I created the spamd home directory and set the permissions.

mkdir /var/log/spamassassin
chown spamd:spamd /var/log/spamassassin

Then I set up some configuration for SpamAssassin. You can edit the file directly, but I use Sed so that I can automate the installation process in a script. This enables SpamAssassin, Cron, and some other options.

DEFAULT_SPAMASSASSIN=/etc/default/spamassassin
mv $DEFAULT_SPAMASSASSIN $DEFAULT_SPAMASSASSIN.default
sed '
    s/ENABLED=0/ENABLED=1/
    s/CRON=0/CRON=1/
    s/^OPTIONS.*/SAHOME="\/var\/log\/spamassassin"\nOPTIONS="--create-prefs --max-children 5 --username spamd -H ${SAHOME} -s ${SAHOME}\/spamd.log"/
' $DEFAULT_SPAMASSASSIN.default > $DEFAULT_SPAMASSASSIN

Then I set up the rest of the configuration for SpamAssassin. I initially set the required score to 2.0, but this caused a lot of legitimate emails (ham) to be marked as spam. The following configuration will rewrite subjects of spam messages to identify them as spam.

SA_LOCAL_CF=/etc/spamassassin/local.cf
mv $SA_LOCAL_CF $SA_LOCAL_CF.default
echo "
rewrite_header Subject [***** SPAM _SCORE_ *****]
required_score           5.0
# to be able to use _SCORE_ we need report_safe set to 0
# If this option is set to 0, incoming spam is only 
# modified by adding some \"X-Spam-\" headers and no 
# changes will be made to the body.
report_safe     0

# Enable the Bayes system
use_bayes               1
use_bayes_rules         1
# Enable Bayes auto-learning
bayes_auto_learn        1

# Enable or disable network checks
skip_rbl_checks         0
use_razor2              0
use_dcc                 0
use_pyzor               0
" > $SA_LOCAL_CF

Now that I have been running the spam filter for a couple weeks, I have had to whitelist some email addresses that send me emails with strange headers or get sent from “shady” IP addresses. This goes into the same local.cf file.

whitelist_from *@hq.acm.org

I find it amusing that emails from the ACM keep getting marked as spam. Next I started SpamAssassin.

/etc/init.d/spamassassin start

Next, I modified Postfix to send emails through the SpamAssassin filter.

POSTFIX_MASTER_CF=/etc/postfix/master.cf
mv $POSTFIX_MASTER_CF $POSTFIX_MASTER_CF.default
sed 's/smtp      inet  n       -       -       -       -       smtpd/smtp      inet  n       -       -       -       -       smtpd\n\t-o content_filter=spamassassin/'  \
$POSTFIX_MASTER_CF.default > $POSTFIX_MASTER_CF
echo 'spamassassin unix -     n       n       -       -       pipe
  user=spamd argv=/usr/bin/spamc -f -e    
  /usr/sbin/sendmail -oi -f ${sender} ${recipient}' >> $POSTFIX_MASTER_CF

Next, reload Postfix so it will use SpamAssassin.

/etc/init.d/postfix reload

Once SpamAssassin is running, you can train it by passing it spam and ham emails.

sa-learn -u spamd --spam --mbox /path/to/spam_mbox
sa-learn -u spamd --ham --mbox /path/to/ham_mbox

After adjusting the spam threshold, training the filter with spam messages that I have acquired over the last year, and whitelisting a few problematic senders, my spam filter has been doing a good job of marking spam as spam. At this point it is easy enough to sort through the email manually and confirm that they are spam. In the future, if it ever gets bad enough, I will be able to automatically delete the messages or filter them into a different mailbox on delivery.

Configuring a Mail Server with Postfix

I gave a presentation about Postfix to the Marquette University ACM chapter. It should be a useful starting point for configuring a Postfix mail server. I include details about much of the configuration including canonical maps which was something that I initially found difficult to figure out.

Rather than creating my presentation in Microsoft PowerPoint or OpenOffice.org Impress, I decided to check out the LaTeX Beamer package for my presentation. Since I’ve been using LaTeX for a while, it wasn’t too difficult to figure out, and it worked particularly well because most of the presentation is Postfix configuration text which I could easily include in the presentation.

The presentation is available as a PDF for download.