[Linux-cluster] RHEL3 Cluster network hangup


I am running RHEL3 ES with the RedHat Cluster Suite (not GFS, simply failover cluster).

The clustered application does a lot of printing (lprng), faxing(hylafax) and mailing(sendmail). It uses shell scripts to pass the jobs to the operating systems daemons.

The client programs of these daemons, which pass jobs to the daemons using network connections to localhost start to behave irregular when the cluster is up for about 2 weeks.

- hylafax faxstat stops listing the transmitted faxes in the middle of the list ( but always at the same job )
- sendmail opens a connection to the local daemon but does not transfer the message. Both processes sit there and wait, after some time the server closes the connection because of missing input from the clients side.
- same with lpr.

I assume that something locks up in the ip stack. Not all services are affected at the same time.

I guess this is related to the cluster software as we run that application on a lot of servers which all do not show this behaviour and that are all not clustered.

Any hints?

regards, Gunther
