[Linux-cluster] Error: Is lock_gulm running


Sometimes a GFS node doesn't shutdown cleanly. A lot of error messages like "Is lock_gulm running. error_code=111" show up on the console. The node doesn't shutdown so I have to power reset it and after starting the node fsck does the filesystem check. It just takes a lot of time. Is it possible to have the node shutdown cleanly in such cases.

We use manual method for fencing one of the nodes. There was a network outage between the master lock server and this node. The node status was set to expire and fence_manual didn't succeed so it couldn't join the cluster after restarting it. fence_ack_manual -s nodeip complained there is no /tmp/fifo.tmp file. I had to restart the cluster to get this node join the cluster. Is it possible to join the node without restarting the cluster when it happens again?


