[Linux-cluster] Adding new file system caused problems

Fair, Brian xbfair at citistreetonline.com
Fri Nov 30 14:34:45 UTC 2007


I think this is something we see. The workaround has basically been to
disabled clustering (lvm wise) when doing this kind of change, and to
handle it manually:

 

Ie:

 

vgchange -c n <vg> to disable the cluster flag

lvmconf -disable-cluster on all nodes

rescan/discover lun, whatever, on all nodes

lvcreate on one node

lvchange -refresh on every node

lvchange -a y on one node

gfs_grow on one host (you can run this on the other to confirm, it
should say it can't grow anymore)

 

When done, I've been putting things back how they were with vgchange -c
y, lvmconf -disable-cluster, though I think if I you just left it
unclustered it'd be fine... what you won't want to do is leave the vg
clustered, but not -enable-cluster... if you do this when you reboot the
clustered volume groups won't be activated.

 

Hope this helps... if anyone knows of a definitive fix for this I'd like
to hear about it, we haven't pushed for it since it isn't too big of a
hassle and we aren't constantly adding new volumes, but it is a pain.

 

Brian Fair, UNIX Administrator, CitiStreet

904.791.2662

 

 

 

From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Randy Brown
Sent: Tuesday, November 27, 2007 12:23 PM
To: linux clustering
Subject: [Linux-cluster] Adding new file system caused problems

 

I am running a two node cluster using Centos 5 that is basically being
used as a NAS head for our iscsi based storage.  Here are the related
rpms and their versions I am using:
kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
system-config-lvm-1.0.22-1.0.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
gfs-utils-0.1.11-3.el5
lvm2-2.02.16-3.el5
lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded
to make it available to the cluster so it could be served via NFS to a
client on our network.  I used pvcreate and vgcreate as I always do and
created a new volume group.  When I went to create the logical volume I
saw this message:
Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid
not found:
9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and
couldn't.  Lvdisplay showed that the logvol had been created and
vgdisplay looked good with the exception of the volume not being
activated.  So, I ran vgchange -aly <Volumegroupname> which didn't
return any error, but also did not activate the volume.  I then rebooted
the node which made everything OK.  I could now see the VG and lvol,
both were active and I could now create the gfs file system on the lvol.
The file system mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I
stopped the cluster services on this node which all stopped cleanly and
then tried to restart them.  cman started fine but clvmd didn't.  It
hung on the vgscan.   Even after a reboot of node #2, clvmd would not
start and would hang on the vgscan.  It wasn't until I shut down both
nodes completely and started cluster that both nodes could see the new
filesystem.

I'm sure it's my own ignorance that's making this more difficult than it
needs to be.  Am I missing a step?  Is more information required to
help?  Any assistance in figuring out what happened here would be
greatly appreciated.  I know I going to need to do similar tasks in the
future and obviously can't afford to bring everything down in order for
the cluster to see a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:
[root at nfs2-cluster ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="60"/>
        <clusternodes>
                <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="8"
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2"
votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="7"
switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="nfs-failover" ordered="0"
restricted="1">
                                <failoverdomainnode
name="nfs1-cluster.nws.noaa.gov" priority="1"/>
                                <failoverdomainnode
name="nfs2-cluster.nws.noaa.gov" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="140.90.91.244" monitor_link="1"/>
                        <clusterfs
device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647"
fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
                        <nfsexport name="fs-shared-exp"/>
                        <nfsclient name="fs-shared-client"
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                        <clusterfs
device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233"
fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/>
                        <nfsexport name="rfcdata-exp"/>
                        <nfsclient name="rfcdata-client"
options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                </resources>
                <service autostart="1" domain="nfs-failover" name="nfs">
                        <clusterfs ref="fs-shared">
                                <nfsexport ref="fs-shared-exp">
                                        <nfsclient
ref="fs-shared-client"/>
                                </nfsexport>
                        </clusterfs>
                        <ip ref="140.90.91.244"/>
                        <clusterfs ref="rfcdata">
                                <nfsexport ref="rfcdata-exp">
                                        <nfsclient
ref="rfcdata-client"/>
                                </nfsexport>
                                <ip ref="140.90.91.244"/>
                        </clusterfs>
                </service>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.42.30"
login="rbrown" name="nfspower" passwd="XXXXXXX"/>
        </fencedevices>
</cluster>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20071130/92c7a845/attachment.htm>


More information about the Linux-cluster mailing list