Large Prod Env Mail Host Was [Re: ClamAV Feedback]

Mon Oct 25 18:42:03 UTC 2004

Ow Mun Heng wrote:
> On Sat, 2004-10-23 at 08:20, Rick Stevens wrote:
> 
>>I use ClamAV in a fairly large production environment (three outgoing
>>servers, four incoming servers) serving 10,000 domains and 120,000
>>accounts.
> 
>  
> Interesting. Exactly what I would like to know more about.
> 
> 
>>This is all done with open-source software
>>(sendmail, procmail, ClamAV, OpenLDAP) 
> 
> 
> Sendmail scales well?? I was reading that sendmail is slower compred to
> postfix.

If you know how to tune sendmail and the kernel it works quite well.
Postfix is inappropriate for our task as it's not configurable enough
for our rather odd layout.  I need sendmail's check_mail and check_rcpt
rulesets for authentication and rewriting headers for wildcard delivery.

> As such, can you tell us, (if it's possible) exactly how the
> implementation is done at your side?
> 
> Internet
> 	|
> sendmail(OpenLDAP (Auth))
> 	|
> procmail
> 	|
> ClamAV
> 	| 
> BogoFilter
> 	|
> Mail Client
> (Is there MySQL backend anywhere in there?)
> 
> Something like that? I'm looking for something that scales well.

Well, in a nutshell, it's this:

We use sendmail 8.12.11 right now.  Fairly standard configuration
(much like what Fedora ships with).  Milter and LDAP support must be
enabled and the milter library installed in /usr/lib or wherever so
ClamAV can find it when built.

We have several customized schema in LDAP to accomodate our additional
attributes, so all user accounts have an RFC2307-compliant record in our 
LDAP system ("objectClass: posixAccount") along with the additional
schema we invented ("objectClass: emailAccount").

For outgoing mail, the sender's domain and usernames are checked against
LDAP via entries in the "check_mail" ruleset.  If not found, mail is
rejected.  If accepted, they go through the normal TLS mechanism to
authenticate and send mail.  The passwords are stored in LDAP.  All
messages are run through ClamAV for virus filtering.  Virus-infected
mail is silently discarded.

For incoming mail, the recipient's domain name is checked against our
LDAP entries via entries in the "check_rcpt" ruleset.  If the domain
isn't found, the mail is rejected.

Next, the username is checked against that domain in LDAP.  If the user
isn't found, we look for a wildcard mailbox (called "catchall").  If
that's found, the headers for the mail are rewritten and delivery is
made to the catchall mailbox.  If neither the user nor the catchall is
found, the mail is rejected.  Again, all incoming mail is run through
ClamAV for virus filtering and virus-infected mail is silently
discarded.

Delivery is via a customized program called "mailwedge".  This program
checks the additional attributes in LDAP for things like forwarding,
autoresponders and the like and handles those items.  If, after checking
all of the extra attributes, we still need to deliver to a mailbox on
our systems, mailwedge invokes procmail with a customized procmail
script to do the actual delivery.  This is also where Bogofilter is
called if we do spam filtering.  We only add the Bogofilter's
"X-Bogocity" header stating whether Bogofilter thinks it's spam or not.
As an ISP, we can't arbitrarily discard mail, so we leave it up to the
client to set up a filter on their mail client software to discard or
otherwise segregate mail based on that header.

We use a custom-written UID/GID daemon that dynamically allocates UIDs
and GIDs as needed for delivery.  All user mailboxes live on a NAS
device mounted via NFS.  The directory structure is:

	/mountpoint/domain-name.tld/username

The POP and IMAP servers are tweaked so that they also use LDAP for
authentication and also so that they are extremely careful about write
operations.  They also use the UID/GID daemon to get UIDs for a given
session.  Webmail is based on Horde/IMP and is also tweaked to allow
users to change their passwords and other things on the LDAP servers.

The only parts of the system that aren't open source are the UID/GID
daemon and the mailwedge program.  The rest of the stuff is off-the-
shelf open source with some (rather clever if I do say so myself)
tweaks.

The upshot of this is that the users are not in any passwd file, nor do
they really have accounts on any of the servers.  They are figments of
LDAP's imagination and sendmail, procmail, POP and IMAP all play along.
The dynamic UID/GID daemon allows us to reuse UIDs and GIDs, and is
our tests indicate the whole system is scalable to over ten billion
individual accounts.  Sure, you'll need more servers to handle that many
accounts, but servers are cheap and they're all configured identically,
based on their roles (outgoing mail, incoming mail, POP or IMAP).

"If it's stupid and it works...it ain't stupid!"
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-        Polygon: A dead parrot (With apologies to John Cleese)      -
----------------------------------------------------------------------