[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] frozen services are stopped when rgmanager is restarted



Hi,

 

RHEL 5.4:  cluster2 (I think).

 

I expected to be able to freeze a service on a node and restart rgmanager on that node without interrupting the service.   In practice, starting rgmanager causes the service to be stopped. 

 

Is this what is supposed to happen ?  I thought the whole point of freezing services was to allow maintenance (including restarting cluster software).

 

Are there any options to prevent the services from being stopped when rgmanager is started ?

 

One effect of rgmanager stopping the service is that the cluster reaches an inconsistent state.  Once rgmanager has restarted, the cluster believes that the services are still frozen, where in reality they are stopped.   Any attempt to unfreeze the service causes the service to failover to a standby node.

 

regards,

Martin

 

 

sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:27:05 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online, rgmanager

 svXprdclu002                                              2 Online, Local, rgmanager

 svXprdclu003                                              3 Online, rgmanager

 svXprdclu004                                              4 Online, rgmanager

 svXprdclu005                                              5 Online, rgmanager

 

 Service Name                                 Owner (Last)                                 State

 ------- ----                                 ----- ------                                 -----

 service:ACTIVESITE                           svXprdclu002                                 started

 service:MASTERVIP                            svXprdclu002                                 started

 

[martin cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z ACTIVESITE

Local machine freezing service:ACTIVESITE...Success

 

[martin cp1edidbm002 ~]$ sudo /usr/sbin/clusvcadm -Z MASTERVIP

Local machine freezing service:MASTERVIP...Success

 

[martin cp1edidbm002 ~]$ sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:34:02 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online, rgmanager

 svXprdclu002                                              2 Online, Local, rgmanager

 svXprdclu003                                              3 Online, rgmanager

 svXprdclu004                                              4 Online, rgmanager

 svXprdclu005                                              5 Online, rgmanager

 

 Service Name                                 Owner (Last)                                 State

 ------- ----                                 ----- ------                                 -----

 service:ACTIVESITE                           svXprdclu002                                 started    [Z]

 service:MASTERVIP                            svXprdclu002                                 started    [Z]

 

[martin cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager stop

Shutting down Cluster Service Manager...

Waiting for services to stop:                              [  OK  ]

Cluster Service Manager is stopped.

 

[martin cp1edidbm002 ~]$ sudo /etc/init.d/rgmanager start

Starting Cluster Service Manager:                          [  OK  ]

 

#

# the services are stopped by rgmanager start.  Ugh!

#

 

[martin cp1edidbm002 ~]$ sudo /usr/sbin/clustat

Cluster Status for EDISV1DBM @ Mon Jun 21 16:35:34 2010

Member Status: Quorate

 

 Member Name                                           ID   Status

 ------ ----                                           ---- ------

 svXprdclu001                                              1 Online, rgmanager

 svXprdclu002                                              2 Online, Local, rgmanager

 svXprdclu003                                              3 Online, rgmanager

 svXprdclu004                                              4 Online, rgmanager

 svXprdclu005                                              5 Online, rgmanager

 

 Service Name                                 Owner (Last)                                 State

 ------- ----                                 ----- ------                                 -----

 service:ACTIVESITE                           svXprdclu002                                 started    [Z]

 service:MASTERVIP                            svXprdclu002                                 started    [Z]

 

=========================================

 

The logs show that the service is stopped as rgmanager is started on svXprdclu002. 

 

Jun 21 16:31:19 cp1edidbm002 clurgmgrd: [14256]: <info> Executing /home/martin/dc-dsm status

Jun 21 16:34:58 cp1edidbm002 rgmanager: [15526]: <notice> Shutting down Cluster Service Manager...

Jun 21 16:34:58 cp1edidbm002 clurgmgrd[14256]: <notice> Shutting down

Jun 21 16:35:08 cp1edidbm002 clurgmgrd[14256]: <notice> Shutdown complete, exiting

Jun 21 16:35:08 cp1edidbm002 rgmanager: [15526]: <notice> Cluster Service Manager is stopped.

 

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: Using TCP for communications

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 4

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 5

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 1

Jun 21 16:35:16 cp1edidbm002 kernel: dlm: got connection from 3

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <notice> Resource Group Manager Starting

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Loading Service Data

Jun 21 16:35:17 cp1edidbm002 clurgmgrd[15574]: <info> Initializing Services

Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Executing /bin/true stop

Jun 21 16:35:17 cp1edidbm002 clurgmgrd: [15574]: <info> Removing IPv4 address 10.3.17.20/24 from bond0

Jun 21 16:35:27 cp1edidbm002 clurgmgrd: [15574]: <info> Executing /home/martin/dc-dsm stop

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> Services Initialized

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change: Local UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change: svXprdclu001 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change: svXprdclu003 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change: svXprdclu004 UP

Jun 21 16:35:27 cp1edidbm002 clurgmgrd[15574]: <info> State change: svXprdclu005 UP

 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]