[Linux-cluster] suggestion on freeze-on-node1 and unfreeze-on-node2 approach?

Fri Jan 8 12:03:38 UTC 2010

Hello, I have a cluster with an Oracle service and rhel 5.4 nodes.
Tipically one sets the "shutdown abort" of the DB as the default
mechanism to close the service, to prevent stalling and accelerate
switch of service itself in case of problems.
The same approach is indeed used by the rhcs provided script, that I'm using.

But sometimes we have to do maintenance on DB and use the strategy to
freeze the service, manually stop DB, make modifications, manually
start DB and unfreeze the service.

This is useful when all the work is done on the same node carrying the
service at that moment.
Sometimes we need activities where we want to relocate the service too.
And for the DBAs is desirable to clean shutdown the DB when there is a
planned activity in place.
With the same approach we do something like this:

node1 with active service
- freeze of the service:
clusvcadm -Z SRV

- maintenance activities with manual stop of service components (eg
listener and Oracle instance)

- shutdown of node1
shutdown -h now

The shutdown takes about 2 minutes
it is necessary to do a shutdown, because any command I tried, gave
the error that the service was frozen and that I cannot run that
command...

- Wait on the survival node that:
1) it becomes master for the quorum disk  otherwise it looses quorum
Messagges in /var/log/qdiskd.log
Jan  7 17:57:55 oracs1 qdiskd[7043]: <info> Node 2 shutdown
Jan  7 17:57:55 oracs1 qdiskd[7043]: <debug> Making bid for master
Jan  7 17:58:30 oracs1 qdiskd[7043]: <info> Assuming master role

it takes about 1 minute, after shutdown of the other one

2) the cluster registers that the other node has gone
Messages in /var/log/qdiskd.log
Jan  7 18:00:35 oracs1 openais[7014]: [TOTEM] The token was lost in
the OPERATIONAL state.
Jan  7 18:00:35 oracs1 openais[7014]: [TOTEM] Receive multicast socket
recv buffer size (320000 bytes).
Jan  7 18:00:35 oracs1 openais[7014]: [TOTEM] Transmit multicast
socket send buffer size (320000 bytes).
Jan  7 18:00:35 oracs1 openais[7014]: [TOTEM] entering GATHER state from 2.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] entering GATHER state from 0.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] Creating commit token
because I am the rep.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] Saving state aru 24 high
seq received 24
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] Storing new sequence id
for ring 4da34
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] entering COMMIT state.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] entering RECOVERY state.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] position [0] member 192.168.16.1:
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] previous ring seq 318000
rep 192.168.16.1
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] aru 24 high delivered 24
received flag 1
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] Did not need to
originate any messages in recovery.
Jan  7 18:00:40 oracs1 openais[7014]: [TOTEM] Sending initial ORF token
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] New Configuration:
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.1)
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] Members Left:
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.8)
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] Members Joined:
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ] New Configuration:
Jan  7 18:00:40 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.1)
Jan  7 18:00:41 oracs1 openais[7014]: [CLM  ] Members Left:
Jan  7 18:00:41 oracs1 openais[7014]: [CLM  ] Members Joined:
Jan  7 18:00:41 oracs1 openais[7014]: [SYNC ] This node is within the
primary component and will provide service.
Jan  7 18:00:41 oracs1 openais[7014]: [TOTEM] entering OPERATIONAL state.
Jan  7 18:00:41 oracs1 openais[7014]: [CLM  ] got nodejoin message 192.168.16.1
Jan  7 18:00:41 oracs1 openais[7014]: [CPG  ] got joinlist message from node 1

It takes about 2 minutes (also due to timeouts set up because of
qdisk, cman and multipath interactions needs)

Total of about 5 minutes. And after this we can work on node2:

- unfreeze of the service
clusvcadm -U SRV

This is not enough to have service start automatically.
clustat gives service as "started" on the other node and remains so.
Even if theoretically the node knows that the other one has left the
cluster...... sort of bug in my opinion....

- disable of the service
clusvcadm -d SRV

- enable of the service
clusvcadm -e SRV

At this time the service suddenly starts as there is only one node
alive and it is not necessary to specify the "-m " switch

After a few minutes we can restart the node1 that will join the
cluster again without problems:

Messages in /var/log/qdiskd.log of the node2
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] entering GATHER state from 11.
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] Creating commit token
because I am the rep.
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] Saving state aru 1c high
seq received 1c
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] Storing new sequence id
for ring 4da38
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] entering COMMIT state.
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] entering RECOVERY state.
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] position [0] member 192.168.16.1:
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] previous ring seq 318004
rep 192.168.16.1
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] aru 1c high delivered 1c
received flag 1
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] position [1] member 192.168.16.8:
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] previous ring seq 318004
rep 192.168.16.8
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] aru a high delivered a
received flag 1
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] Did not need to
originate any messages in recovery.
Jan  7 18:12:50 oracs1 openais[7014]: [TOTEM] Sending initial ORF token
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ] New Configuration:
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.1)
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ] Members Left:
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ] Members Joined:
Jan  7 18:12:50 oracs1 openais[7014]: [CLM  ] CLM CONFIGURATION CHANGE
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ] New Configuration:
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.1)
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.8)
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ] Members Left:
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ] Members Joined:
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ]   r(0) ip(192.168.16.8)
Jan  7 18:12:51 oracs1 openais[7014]: [SYNC ] This node is within the
primary component and will provide service.
Jan  7 18:12:51 oracs1 openais[7014]: [TOTEM] entering OPERATIONAL state.
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ] got nodejoin message 192.168.16.1
Jan  7 18:12:51 oracs1 openais[7014]: [CLM  ] got nodejoin message 192.168.16.8
Jan  7 18:12:51 oracs1 openais[7014]: [CPG  ] got joinlist message from node 1
Jan  7 18:13:20 oracs1 qdiskd[7043]: <debug> Node 2 is UP

So the steps above let us clean switch the db with this limits:
1) it takes about 10-15 minutes to have the whole cluster up again
with both nodes active
2) we have to shutdown one node and in case of clusters with more than
only one service this could be a blocker at all of the approach
itself.

Any hints?

Thanks,
Gianluca