[Linux-cluster] qdiskd not properly failing nodes??

danwest at comcast.net danwest at comcast.net
Tue Sep 12 16:12:30 UTC 2006


Below is the qdisk configuration for a simple 2 node cluster with a webserver services.  The service is configured with 3 heuristics below.
 
<quorumd interval="1" tko="10" votes="3" device="/dev/disk/by-name/36006048000018794040051524d304234"
 status_file="/tmp/qdisk_status" log_level="7">
                <heuristic program="ping X.X.X.X -c1 -t1" score="1" interval="2"/>
                <heuristic program="[ -f /quorum ]" score="1" interval="2"/>
                <heuristic program="curl -s http://X.X.X.X | grep CLUSTER >> /dev/null" score="2"
 interval="1"/>
 </quorumd>
 
# cat /tmp/qdisk_status
Node ID: 1
Score (current / min req. / max allowed): 4 / 2 / 4
Current state: Master
Current disk state: None
Visible Set: { 1 2 }
Master Node ID: 1
Quorate Set: { 1 2 }
 
Causing the last 2 heuristics to fail causes the score to fall below ½ and in theory should reboot the node.  So far I get confirmation in /var/log/messages but no actual reboot ( See below ).  The service (webserver) also remains on the node that dropped below ½. 
 
# cat /tmp/qdisk_status
Node ID: 1
Score (current / min req. / max allowed): 1 / 2 / 4
Current state: None
Current disk state: None
Visible Set: { 1 2 }
Master Node ID: 2
Quorate Set: { }
 
/var/log/messages
 
Sep 12 11:34:02 SERVER1 qdiskd[7495]: <notice> Score insufficient for master operation (1/2; max=4); downgrading
Sep 12 11:34:04 SERVER1 qdiskd[7495]: <info> Node 2 is the master
 
Sep 12 11:34:02 SERVER2 qdiskd[9780]: <info> Node 1 shutdown
Sep 12 11:34:02 SERVER2 qdiskd[9780]: <debug> Making bid for master
Sep 12 11:34:03 SERVER2 qdiskd[9780]: <info> Assuming master role
 
Any idea why the server is not getting rebooted/fenced?
 
Thanks,
 Dan




More information about the Linux-cluster mailing list