[Linux-cluster] Freezing GFS mount in a cluster

Kees Hoekzema kees at tweakers.net
Tue Jul 8 12:25:31 UTC 2008


Hello List,

Recently we bought an Dell MD3000 iSCSI storage system and we are trying to
get GFS running on it. I have 3 test servers hooked up to the MD3000i and I
have the cluster working, including multipath and different paths.

When I had the cluster up with all 3 nodes in the fence domain and cman_tool
status reporting 3 nodes I made a GFS partition and formatted it:
# gfs_mkfs -j 10 -p lock_dlm -t tweakers:webdata /dev/mapper/webdata-part1

This worked and I could mount the filesystem on the server I made it on.
However, as soon as I tried to mount it on one of the two other servers, I
would get a freeze and get fenced. After a fresh reboot of the complete
cluster I tried to mount it again. The first server could mount it, but any
server that would try to mount it with the first server having the gfs
mounted would crash.

As I'm fairly new to cman/fencing/gfs-clusters, I was wondering if this is
something 'silly' configuration error, or that there is something seriously
wrong.

Another thing I would like to know is where to get debug information. Right
now there is not a lot debug information available, or at least I couldn't
find it. One thing that particularly annoyed me was the ' Waiting for fenced
to join the fence group.' message which didn't come with any explanation
whatsoever. That message finally went away when I powered up the two other
servers and started the cluster on all three simultaneously.

Anyway, my cluster config for this testing. I use  manual fencing for
testing as the environment I test it in does not have exactly the same
hardware as I have in the production environment.

<?xml version="1.0"?>
<cluster name="tweakers" config_version="4">
  <cman expected_votes="1">
  </cman>
  <clusternodes>

    <clusternode name="ares" nodeid="1" votes="1">
      <fence>
        <method name="human">
          <device name="last_resort" ipaddr="node1"/>
        </method>
      </fence>
    </clusternode>

   <clusternode name="abaris" nodeid="2" votes="1">
      <fence>
        <method name="human">
          <device name="last_resort" ipaddr="node2"/>
        </method>
      </fence>
    </clusternode>

   <clusternode name="adonis" nodeid="3" votes="1">
      <fence>
        <method name="human">
          <device name="last_resort" ipaddr="node2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
    <fencedevice name="last_resort" agent="fence_manual"/>
  </fencedevices>
  <fence_daemon clean_start="0"/>
  <fence_daemon post_join_delay="30"/>
  <fence_daemon post_fail_delay="30"/>
  <rm log_level="7" log_facility="syslog">
  </rm>

</cluster>

Conclusion:
- why can't I mount GFS on another server, when it is mounted on one?
- how do I get more debug information (ie: reason why a server can't join a
fence domein. Or the reason why a server gets fenced).

Thank you all for your time,

Kees Hoekzema





More information about the Linux-cluster mailing list