[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: [Linux-cluster] Adding new file system caused problems



I think this is something we see. The workaround has basically been to disabled clustering (lvm wise) when doing this kind of change, and to handle it manually:

 

Ie:

 

vgchange –c n <vg> to disable the cluster flag

lvmconf –disable-cluster on all nodes

rescan/discover lun, whatever, on all nodes

lvcreate on one node

lvchange –refresh on every node

lvchange –a y on one node

gfs_grow on one host (you can run this on the other to confirm, it should say it can’t grow anymore)

 

When done, I’ve been putting things back how they were with vgchange –c y, lvmconf –disable-cluster, though I think if I you just left it unclustered it’d be fine… what you won’t want to do is leave the vg clustered, but not –enable-cluster… if you do this when you reboot the clustered volume groups won’t be activated.

 

Hope this helps… if anyone knows of a definitive fix for this I’d like to hear about it, we haven’t pushed for it since it isn’t too big of a hassle and we aren’t constantly adding new volumes, but it is a pain.

 

Brian Fair, UNIX Administrator, CitiStreet

904.791.2662

 

 

 

From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Randy Brown
Sent: Tuesday, November 27, 2007 12:23 PM
To: linux clustering
Subject: [Linux-cluster] Adding new file system caused problems

 

I am running a two node cluster using Centos 5 that is basically being used as a NAS head for our iscsi based storage.  Here are the related rpms and their versions I am using:
kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
system-config-lvm-1.0.22-1.0.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
gfs-utils-0.1.11-3.el5
lvm2-2.02.16-3.el5
lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded to make it available to the cluster so it could be served via NFS to a client on our network.  I used pvcreate and vgcreate as I always do and created a new volume group.  When I went to create the logical volume I saw this message:
Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid not found: 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and couldn't.  Lvdisplay showed that the logvol had been created and vgdisplay looked good with the exception of the volume not being activated.  So, I ran vgchange -aly <Volumegroupname> which didn't return any error, but also did not activate the volume.  I then rebooted the node which made everything OK.  I could now see the VG and lvol, both were active and I could now create the gfs file system on the lvol.  The file system mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I stopped the cluster services on this node which all stopped cleanly and then tried to restart them.  cman started fine but clvmd didn't.  It hung on the vgscan.   Even after a reboot of node #2, clvmd would not start and would hang on the vgscan.  It wasn't until I shut down both nodes completely and started cluster that both nodes could see the new filesystem.

I'm sure it's my own ignorance that's making this more difficult than it needs to be.  Am I missing a step?  Is more information required to help?  Any assistance in figuring out what happened here would be greatly appreciated.  I know I going to need to do similar tasks in the future and obviously can't afford to bring everything down in order for the cluster to see a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:
[root nfs2-cluster ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="60"/>
        <clusternodes>
                <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="8" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="7" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="nfs-failover" ordered="0" restricted="1">
                                <failoverdomainnode name="nfs1-cluster.nws.noaa.gov" priority="1"/>
                                <failoverdomainnode name="nfs2-cluster.nws.noaa.gov" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="140.90.91.244" monitor_link="1"/>
                        <clusterfs device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647" fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
                        <nfsexport name="fs-shared-exp"/>
                        <nfsclient name="fs-shared-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                        <clusterfs device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233" fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/>
                        <nfsexport name="rfcdata-exp"/>
                        <nfsclient name="rfcdata-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                </resources>
                <service autostart="1" domain="nfs-failover" name="nfs">
                        <clusterfs ref="fs-shared">
                                <nfsexport ref="fs-shared-exp">
                                        <nfsclient ref="fs-shared-client"/>
                                </nfsexport>
                        </clusterfs>
                        <ip ref="140.90.91.244"/>
                        <clusterfs ref="rfcdata">
                                <nfsexport ref="rfcdata-exp">
                                        <nfsclient ref="rfcdata-client"/>
                                </nfsexport>
                                <ip ref="140.90.91.244"/>
                        </clusterfs>
                </service>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.42.30" login="rbrown" name="nfspower" passwd="XXXXXXX"/>
        </fencedevices>
</cluster>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]