[Linux-cluster] Expected qdiskd behaviour on node failure and reboot implementation
Rafael Micó Miranda
rmicmirregs at gmail.com
Tue Jan 19 22:27:29 UTC 2010
Hi all,
Today I was shocked while making a test to one of my cluster
configurations.
A) Environment
- 6 x different servers used as cluster nodes, with dual FC HBA
- iLO/DRAC fencing devices for each cluster node
- 2 x different fabrics, each build with 3 FC SAN switches
- 2 x storage arrays, with 23 270GB LUNs of data each.
- 1x Qdisk: a 24th LUN located in one of the storage arrays
- 2x qdisk heuristics
B) Test
- Removing the 2 service interface wires on node A
C) Expected behaviour (due to qdisk and cman timers)
- Qdiskd should notice the lost of the heuristics on node A
- CMAN should notice the lost of connectivity with node A
- The rest of the nodes should fence node A
D) Experienced behaviour
- Qdisk notices the lost of the heuristic on node A
- Qdisk reboots via "hard reset" node A
- CMAN notices the lost of connectivity with node A
- The rest of the nodes fence it (I see the 2 reboots in the iLO log of
the system)
I was shocked with the capacity of Qdisk of doing a "hard reset" of the
system. I mean: it was not a clean shutdown of the system via a "reboot"
or "poweroff" O.S. command. It was more likely to be a power reset in
the system. I was expecting to qdisk to stop the CMAN service or, in the
strongest situation, doing a clean reboot of the system.
After that, I found this in the qdisk man page:
"By default, only nodes scoring over 1/2 of the total maximum score will
claim they are available via the quorum disk, and a node (master or
otherwise) whose score drops too low will remove itself (usually, by
rebooting).
[...]
reboot="1"
If set to 0 (off), qdiskd will *not* reboot after a negative
transition as a result in a change in score (see section 2.2).
The default for this value is 1 (on)."
So my thoughts were wrong and this is the default behaviour, isn't it?
I'm pretty sure in my previous tests I did not see this behaviour.
Another question is: how does qdisk implement the "reboot" function? Is
it really a "hard reset"?
Thanks in advance,
Rafael
--
Rafael Micó Miranda
More information about the Linux-cluster
mailing list