[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] force fencing

On Mon, Jul 6, 2009 at 10:08 AM, Armanet Stephane <armanets ill fr> wrote:
Hello list

I'm trying to setup a 3 nodes Cluster with 2 failover Domain for an HA
mail solution.
I want 1 run active for the Imap server in the Imap Failover domain , 1
node active for the Smtp in the Smtp Failover domain and the 3rd in the
2 failover domain as a backup node.

I run Centos 5.3
My fence device is a wti power switch

My cluster.conf is in attachement

My SMTP service is composed of:
       1 IP
       1 amavisd scritp
       1 postfix script
       2 NFS mount for postfix and amavis

If I manually kill the postfix master process (to simulate a crash), my
node is not fence and the logs said:

Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/postfix status
Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
status of /etc/init.d/postfix failed (returned 3)
Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> status on script
"postfix" returned 1 (generic error)
Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> Stopping service
Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/amavisd stop
Jul  6 10:00:40 centos-smtp1 kernel: do_vfs_lock: VFS is out of sync
with lock manager!
Jul  6 10:00:40 centos-smtp1 last message repeated 8 times
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/postfix stop
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
stop of /etc/init.d/postfix failed (returned 1)
Jul  6 10:00:41 centos-smtp1 clurgmgrd[4228]: <notice> stop on script
"postfix" returned 1 (generic error)
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Removing IPv4
address from bond0
Jul  6 10:00:41 centos-smtp1 avahi-daemon[3552]: Withdrawing address
record for on bond0.
Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <crit> #12: RG
service:Postfix failed to stop; intervention required
Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <notice> Service
service:Postfix is failed
Jul  6 10:00:52 centos-smtp1 ntpd[3322]: synchronized to,
stratum 1

Clustat said:

Cluster Status for cluster-test @ Mon Jul  6 10:02:39 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 centos-imap1.ill.fr                                                 1
Online, Local, rgmanager
 centos-imap2.ill.fr                                                 2
Online, rgmanager
 centos-smtp1.ill.fr                                                 3
Online, rgmanager
 /dev/disk/by-id/scsi-360a98000567247514634507447594661-part1        0
Online, Quorum Disk

 Service Name                                                   Owner
(Last)                                                   State
 ------- ----                                                   -----
------                                                   -----
centos-imap2.ill.fr                                            started

(centos-smtp1.ill.fr)                                          failed

So I have to disable the Postfix servcie with:
       clusvcadm -d Postfix
and re-enable
       clusvcadm -e Postfix

Could you explain my why my original smtp node is not fenced and why my
service is not start on the 2nd node ???
Nodes are fenced only when they lost communications with the other nodes, not when a service fails.
You should check the init scripts  to make sure it works fine outside the cluster, return values are important. I think in your case is failing because you killed postfix in a way it deleted the .pid file, and that made the init script fail.
BTW you should configure the service as recovery="relocate" if you want them to be started on a different node.


Is there a way to force the fencing ???

ARMANET Stephane
Division Projet Technique
Service Informatique
 Groupe Infrastructure

Institut Laue langevin

Linux-cluster mailing list
Linux-cluster redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]