[Linux-cluster] Running GFS without fencing and maybe locking ;-)

Wed Mar 22 15:12:16 UTC 2006

While setting up our cluster I was wondering why GFS blocks the
filesystem when one of the nodes fails (the cluster remains in state
"recover" and waits for the failed node to be fenced). Manual fence is
quite to slow (if I am issuing it) and while we are running more than
one services on a node we cannot shut it down over ILO or deactivate the
Brocade port:

 - The nodes have an mysql database running completely in memory (which
is lost if fenced is powering this system off)

 - The nodes have more filesystems mounted which may also fail if I'm
deactivating the port on the Brocade switch

Our cluster consists of 4 Webservers and one management server. This
management server is the only server which needs write access to the GFS
(for example changing the html-files):

	webserver1 - webserver4: mount GFS -o ro (readonly)
	mgm-server1: mount GFS -o rw (write access)

My idea:

If one of the webserver fails then the cluster will issue an
fence_script with an exitcode "0". The node is fenced by the cluster and
while the filesystem wasn't mounted rw it cannot be destroyed.

The only possible way the filesystem can get corrupted is when the
management-server fails.

So is it possible to run the GFS with 4 readonly nodes and only one node
 which should be taken care if it fails? How does locking (lock_dlm)
work  in this case? I suppose that it only needs to take care for any
writes to the filesystem but here I might be wrong?!

Can I use lock_nolock (when making the filesystem) if only one node is
writing to the GFS?

Arnd