spamassassin/user_prefs

Tue Mar 23 14:31:13 UTC 2004

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 23 March 2004 04:17 am, Nigel Wade wrote:
> Charles Howse wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On Monday 22 March 2004 09:12 am, Nigel Wade wrote:
> >>Charles Howse wrote:
> >>>-----BEGIN PGP SIGNED MESSAGE-----
> >>>Hash: SHA1
> >>>
> >>>Hi,
> >>>
> >>>While reading another thread, I remembered I had no custom preferences
> >>>for spamassassin, and decided to create some.
> >>>
> >>>I use the default settings for starting spamassassin at boot, and the
> >>>following filters in KMail:
> >>>1. In KMail menus, select Settings->Configure Filters
> >>>2. Create a new filter with filter criteria:
> >>>    <any header> matches regular expression .
> >>>    (the regular expression is just the character "." meaning
> >>>    "any character")
> >>>    and filter action:
> >>>    pipe through spamc
> >>>    Uncheck the box "stop processing if this filter matches"
> >>>3. Add a second filter below the one created in step 2, with criteria:
> >>>    <any header> contains X-Spam-Flag: YES
> >>>    and action:
> >>>    move to folder trash
> >>>    (or whatever you want to do with your spam)
> >>>    check the "stop processing..." box
> >>>
> >>>These filters are working fine, with the exception of those html spams
> >>>with all the random words in the body when viewed in text mode.
> >>>
> >>>I was just wondering if anyone would like to share some _generic_
> >>>preferences for ~/.spamassassin/user_prefs, or comment.
> >>
> >>The way to catch those is with Bayesian filtering. You need to teach the
> >>Bayesian filter with sufficient messages so that it learns what is spam
> >> and what is not (at least 1000 of each is a good rule of thumb for best
> >> accuracy).
> >
> > For the sake of the original subject, I was interested in the user_prefs
> > file.
> >
> >
> > I'm periodically training it with sa -learn on the MissedSpam folder. 
> > I'll 'get there' sooner or later.
> >
> > I have never seen a false positive in my FilteredSpam folder, so I see no
> > need to train it on what *is* spam.  Am I wrong?
>
> It's most important to train it with anything it misclassifies. But it's
> still a good idea to train it with both spam and ham which it has
> identified correctly. This way its database of spam and ham is kept
> current. If you don't keep training it it will get steadily worse and worse
> as the spam evolves.

This is good information.  I had not thought of that.  Thank you.  I already 
have a script that will do that, just need to run it occasionally.

jdow has pointed me to a page with custom rulesets:
http://wiki.apache.org/spamassassin/CustomRulesets
but I'm still interested in comments or generic settings for my 
~/.spamassassin/user_prefs file.  Anyone care to share?

- -- 
Charles Howse
Jackson, TN
Registered Linux user # 347576 (http://counter.li.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAYEo1/S+VsB9RMKgRAqIvAJ9ti9YR+lSCvjPKzMLEGOpZL72nlgCfWa+6
Alwnou+SlMEtfXnJx+ATHzY=
=aOVp
-----END PGP SIGNATURE-----