Blocking Spam

Thu Dec 28 04:05:39 UTC 2006

From: "Kevin Martin" <kevintm at ameritech.net>

>>> On Wed, 2006-12-27 at 11:56 +0000, James Wilkinson wrote:
>>>> jdow wrote:
>>>>> Besides, WTF good is Bayes with image spam?
>>>> Actually, I find that SA's Bayesian engine is pretty good at spotting
>>>> the random text they put in most image spam. With a few extra points 
>>>> for
>>>> technical stuff, most of it goes directly to the spam folder, and the
>>>> rest to the "unsure" folder with fairly high scores.
>>>>
>>>> James.
>>> I guess I have to point this out with a little bit of trepidation. But
>>> if you have an unsure folder you are probably using SpamBayes not
>>> Spamassassin.
>>
>> Thanks for saying it for me. {^_-} Bayes alone is a bicycle with one 
>> pedal.
>> Add some rules and DNS tests and you go from "unsure" to "pretty darned
>> sure" - at the level of one in a thousand or so. Add FuzzyOCR and you
>> step up from coaster brakes to caliper brakes. A fully loaded 
>> SpamAssassin
>> is to a mere SpamBayes as a top of the line multi-speed bicycle is to
>> a broken down beach cruiser with one pedal broken off. SpamBayes has its
>> uses if one lives on an Internet Beach, I suppose. It make it look to the
>> locals like you get some exercise even if it can't go anywhere.
>>
>> {^_-}
>>
>
> Correct me if I'm wrong but SpamBayes can be run with little effort in a 
> pop/imap proxy setup while you can't do that with
> SpamAssassin (at least, as I understand SpamAssassin is when you are 
> running your own mail server where SpamAssassin can filter out
> incoming mail but it's fairly useless when your a client connecting to a 
> mail server that does a poor job or no spam filtering)?  If
> I'm wrong please correct me as I also like what SpamAssassin has to offer 
> and it /does/ seem to offer more that SpamBayes but it's
> certainly not obvious (at least to me) how to use it as a client side spam 
> buster as opposed to the server side.

The setup here flows the email this way:
1) fetchmail gets it from the various mail servers and accounts every couple
   minutes. (We use per user .fetchmailrc files for Loren and me. For each
   of us we pull mail from several accounts and drop it into one.)
2) Fetchmail feeds mail directly to procmail. (We use individual .procmailrc
   files as well.
3) Procmail drops the mail into the user accounts in /var/spool/mail.
4) Dovecot is used as the final step to feed the mail to the actual mail
   reader.

I have DoveCot configured to use ssl connections. I also have it open to
the Internet along with ssh. That means I pull down the mail from the same
place through the same filters no matter what machine or where I read
the mail. This is quite handy. I don't travel much. But I get all the
pleasures of home for email service with little or no effort on my part.

Now, if I was reading always on the same machine on which I run Fetchmail
most readers will be able to pull the mail from /var/spool/mail/xxx with
no problem at all.

There are some advantages to using Fetchmail -> procmail. You can tune
Fetchmail for pulling the mail in and possibly leaving copies on the
ISP's servers. And with procmail I can do things like this:

:0:
* ^From: AntiSpam UOL <.*@uol\.com\.br>
#/dev/null
$HOME/mail/uol_crap

That delivers everything that is a uol.com.br AntiSpam challenge response
message and tosses it into a bin where I can monitor how many come in to
each mail I send. The following folks yanked my chain with bad email just
a few too many times. "Bye bye!"

:0:
* ^From: MAILER-DAEMON at ceres\.concept\.net\.nz
/dev/null

A fun thing I do using "formail" is doctor mailing list headers.

:0 fw
* ^TO_:.*(dev at spamassassin\.apache\.org|dev\.spamassassin\.apache\.org)
| formail -A "$PROCMAILMATCH SpamAssassin Dev list" -i "Reply-to: 
dev at spamassassin.apache.org"

That way a simple reply will go to the list rather than the individual
as God and nature intended.

When I get email from Loren I get a strange noise to alert me:

# Mail from Loren
:0 ic
* ^From: .*Loren\.Wilton at unisys\.com
| play /usr/share/sounds/gnibbles/reverse.wav -r 36000 repeat 9

And I do other fun and strange things with procmail. So it's worth it to
run it.

Now, I may spend about 3 seconds clock time per email scanning it. BUT,
I get very good ham vs spam discrimination. I use scores to doctor other
scores for mailing lists. LKML has some characteristics that are normally
very much spam-sign. I solve that by "expanding" the Bayes scores for LKML.
Really high Bayes gets a net extra point or two. Very low Bayes subtracts
some points. "Bye bye list spam!"

Once you are no longer relying on one single sensor to censure or censor
(that one was fun) your email you can tailor discrimination to group
characteristics as well as message characteristics.

The CPU spending time is invisible to me. Mail is very slightly delayed,
maybe five seconds or so. But then, how often does your reader update its
list of messages? It's really no big deal. The time I save with SpamAssassin
in the loop is the big win. I don't have to worry about spam.

I use a SpamAssassin trick to tag the subject line of spam with a rule
trigger for sorting into a spam folder. I also place the spam score in
the subject line. I can sort the folder by subject and spend scanning
time looking at the low scoring messages 'just in case' or to pull
the message and study it for making new rules if that is needed. Then
I can flip to the other end to get my giggles at how high some messages
can score hitting literally dozens of unique rules. Leo Kuvayev has
managed to get up over 100. (Back then his spam was VERY characteristic.
And when he got to 90 I urged him on via the spamassassin users list. We
all KNOW the spammers are monitoring it and foaming at the mouth over how
effective it can be. We see them morph as we nail them. It's a fun game
when you're winning.)

{^_-}