[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] force fencing





On Mon, Jul 6, 2009 at 10:08 AM, Armanet Stephane <armanets ill fr> wrote:
Hello list

I'm trying to setup a 3 nodes Cluster with 2 failover Domain for an HA
mail solution.
I want 1 run active for the Imap server in the Imap Failover domain , 1
node active for the Smtp in the Smtp Failover domain and the 3rd in the
2 failover domain as a backup node.

I run Centos 5.3
My fence device is a wti power switch

My cluster.conf is in attachement

My SMTP service is composed of:
       1 IP
       1 amavisd scritp
       1 postfix script
       2 NFS mount for postfix and amavis

If I manually kill the postfix master process (to simulate a crash), my
node is not fence and the logs said:

Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/postfix status
Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
status of /etc/init.d/postfix failed (returned 3)
Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> status on script
"postfix" returned 1 (generic error)
Jul  6 10:00:40 centos-smtp1 clurgmgrd[4228]: <notice> Stopping service
service:Postfix
Jul  6 10:00:40 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/amavisd stop
Jul  6 10:00:40 centos-smtp1 kernel: do_vfs_lock: VFS is out of sync
with lock manager!
Jul  6 10:00:40 centos-smtp1 last message repeated 8 times
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Executing
/etc/init.d/postfix stop
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <err> script:postfix:
stop of /etc/init.d/postfix failed (returned 1)
Jul  6 10:00:41 centos-smtp1 clurgmgrd[4228]: <notice> stop on script
"postfix" returned 1 (generic error)
Jul  6 10:00:41 centos-smtp1 clurgmgrd: [4228]: <info> Removing IPv4
address 195.83.126.201/24 from bond0
Jul  6 10:00:41 centos-smtp1 avahi-daemon[3552]: Withdrawing address
record for 195.83.126.201 on bond0.
Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
/var/lib/amavis
Jul  6 10:00:51 centos-smtp1 clurgmgrd: [4228]: <info> unmounting
/var/spool/postfix
Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <crit> #12: RG
service:Postfix failed to stop; intervention required
Jul  6 10:00:51 centos-smtp1 clurgmgrd[4228]: <notice> Service
service:Postfix is failed
Jul  6 10:00:52 centos-smtp1 ntpd[3322]: synchronized to 195.83.126.119,
stratum 1

Clustat said:

Cluster Status for cluster-test @ Mon Jul  6 10:02:39 2009
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 centos-imap1.ill.fr                                                 1
Online, Local, rgmanager
 centos-imap2.ill.fr                                                 2
Online, rgmanager
 centos-smtp1.ill.fr                                                 3
Online, rgmanager
 /dev/disk/by-id/scsi-360a98000567247514634507447594661-part1        0
Online, Quorum Disk

 Service Name                                                   Owner
(Last)                                                   State
 ------- ----                                                   -----
------                                                   -----
 service:Imap
centos-imap2.ill.fr                                            started

 service:Postfix
(centos-smtp1.ill.fr)                                          failed




So I have to disable the Postfix servcie with:
       clusvcadm -d Postfix
and re-enable
       clusvcadm -e Postfix



Could you explain my why my original smtp node is not fenced and why my
service is not start on the 2nd node ???
Nodes are fenced only when they lost communications with the other nodes, not when a service fails.
You should check the init scripts  to make sure it works fine outside the cluster, return values are important. I think in your case is failing because you killed postfix in a way it deleted the .pid file, and that made the init script fail.
BTW you should configure the service as recovery="relocate" if you want them to be started on a different node.

Greetings,
Juanra



Is there a way to force the fencing ???


--
ARMANET Stephane
Division Projet Technique
Service Informatique
 Groupe Infrastructure

Institut Laue langevin

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]