We've got a 4 node cluster running RHEL 6.2. As part of the
cluster, we've got several gfs2 filesystem. We've often noticed that
when we reboot a single node in the cluster, the gfs2 mounts take a long
time -- eventually getting the 120 second delay messages. When we
migrated to 6.2, the default mount script echoed the filesystem being
mounted, and we discovered that the long delays were
filesystem-dependent. In particular, two filesystems were causing all
of the problems, both of which had >1M files in them. We also noticed
that dlm_recoverd on one of the other nodes accumulates a lot of time
when this is happening. Is this expected? Are there non-ilnear
handshaking algorithms between the mounting node and the cluster that
are dependent on the number of files?
Thanks in advance!
Linux-cluster mailing list
Linux-cluster redhat com