spamassassin: learning from inbox and junk folders

while tweaking the spam filters on our mail server i finally took the step of adding a cron job to learn from the inbox and junk folders of each user. as we are using spamassassin as part of our spam defense this basically involves a couple of invocations of sa-learn to

  • learn the “ham” from each users inbox folder
  • learn the “spam” from each users junk folder

below is the shell script that gets invoked by cron once a day:

#!/bin/bash
echo "updating spamassassin bayesian spam/ham filter"
echo
for userDir in /home/*; do

    user=$(basename $userDir)
    ham=$userDir/Maildir/{cur,new}
    spam=$userDir/Maildir/.Junk/{cur,new}

    echo "    learning from $user"

    echo "       spam: $spam"
    /usr/bin/sa-learn --no-sync --spam $spam | while read line; do
    echo "            $line"
    done

    echo "        ham: $ham"
    /usr/bin/sa-learn --no-sync --ham $ham | while read line; do
    echo "            $line"
    done

    echo
done
echo "syncing:"
/usr/bin/sa-learn --sync | while read line; do
    echo "    $line"
done
echo
echo "stats:"
sa-learn --dump magic | while read line; do
    echo "    $line"
done

the while read line; do ... done bits are there so that i can nicely indent the output of sa-learn.

works rather nicely.