[Linux-cluster] Adding new file system caused problems

Fri Nov 30 17:05:22 UTC 2007

is this a bug?
i'm getting the exact same thing only during setup of a new clustered
volume group, no resize or anything.
what are the odds of having the lvm under the gfs not clustered?
i can't restart the whole cluster when i add a new clustered
filesystem..
regards,
johannes

Am Freitag, den 30.11.2007, 09:34 -0500 schrieb Fair, Brian:
> I think this is something we see. The workaround has basically been to
> disabled clustering (lvm wise) when doing this kind of change, and to
> handle it manually:
> 
>  
> 
> Ie:
> 
>  
> 
> vgchange –c n <vg> to disable the cluster flag
> 
> lvmconf –disable-cluster on all nodes
> 
> rescan/discover lun, whatever, on all nodes
> 
> lvcreate on one node
> 
> lvchange –refresh on every node
> 
> lvchange –a y on one node
> 
> gfs_grow on one host (you can run this on the other to confirm, it
> should say it can’t grow anymore)
> 
>  
> 
> When done, I’ve been putting things back how they were with vgchange –
> c y, lvmconf –disable-cluster, though I think if I you just left it
> unclustered it’d be fine… what you won’t want to do is leave the vg
> clustered, but not –enable-cluster… if you do this when you reboot the
> clustered volume groups won’t be activated.
> 
>  
> 
> Hope this helps… if anyone knows of a definitive fix for this I’d like
> to hear about it, we haven’t pushed for it since it isn’t too big of a
> hassle and we aren’t constantly adding new volumes, but it is a pain.
> 
>  
> 
> Brian Fair, UNIX Administrator, CitiStreet
> 
> 904.791.2662
> 
>  
> 
>  
> 
>  
> 
> From: linux-cluster-bounces at redhat.com
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Randy Brown
> Sent: Tuesday, November 27, 2007 12:23 PM
> To: linux clustering
> Subject: [Linux-cluster] Adding new file system caused problems
> 
> 
>  
> 
> I am running a two node cluster using Centos 5 that is basically being
> used as a NAS head for our iscsi based storage.  Here are the related
> rpms and their versions I am using:
> kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
> kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
> system-config-lvm-1.0.22-1.0.el5
> cman-2.0.64-1.0.1.el5
> rgmanager-2.0.24-1.el5.centos
> gfs-utils-0.1.11-3.el5
> lvm2-2.02.16-3.el5
> lvm2-cluster-2.02.16-3.el5
> 
> This morning I created a 100GB volume on our storage unit and
> proceeded to make it available to the cluster so it could be served
> via NFS to a client on our network.  I used pvcreate and vgcreate as I
> always do and created a new volume group.  When I went to create the
> logical volume I saw this message:
> Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid
> not found:
> 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs
> 
> I figured I had done something wrong and tried to remove the Lvol and
> couldn't.  Lvdisplay showed that the logvol had been created and
> vgdisplay looked good with the exception of the volume not being
> activated.  So, I ran vgchange -aly <Volumegroupname> which didn't
> return any error, but also did not activate the volume.  I then
> rebooted the node which made everything OK.  I could now see the VG
> and lvol, both were active and I could now create the gfs file system
> on the lvol.  The file system mounted  and I thought I was in the
> clear.
> 
> However, node #2 wasn't picking this new filesystem up at all.  I
> stopped the cluster services on this node which all stopped cleanly
> and then tried to restart them.  cman started fine but clvmd didn't.
> It hung on the vgscan.   Even after a reboot of node #2, clvmd would
> not start and would hang on the vgscan.  It wasn't until I shut down
> both nodes completely and started cluster that both nodes could see
> the new filesystem.
> 
> I'm sure it's my own ignorance that's making this more difficult than
> it needs to be.  Am I missing a step?  Is more information required to
> help?  Any assistance in figuring out what happened here would be
> greatly appreciated.  I know I going to need to do similar tasks in
> the future and obviously can't afford to bring everything down in
> order for the cluster to see a new filesystem.
> 
> Thank you,
> 
> Randy
> 
> P.S.  Here is my cluster.conf:
> [root at nfs2-cluster ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0"?>
> <cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
>         <fence_daemon post_fail_delay="0" post_join_delay="60"/>
>         <clusternodes>
>                 <clusternode name="nfs1-cluster.nws.noaa.gov"
> nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="nfspower"
> port="8" switch="1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="nfs2-cluster.nws.noaa.gov"
> nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="nfspower"
> port="7" switch="1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="1" two_node="1"/>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="nfs-failover"
> ordered="0" restricted="1">
>                                 <failoverdomainnode
> name="nfs1-cluster.nws.noaa.gov" priority="1"/>
>                                 <failoverdomainnode
> name="nfs2-cluster.nws.noaa.gov" priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <ip address="140.90.91.244" monitor_link="1"/>
>                         <clusterfs
> device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647"
> fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
>                         <nfsexport name="fs-shared-exp"/>
>                         <nfsclient name="fs-shared-client"
> options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
>                         <clusterfs
> device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0"
> fsid="54233" fstype="gfs" mountpoint="/rfcdata" name="rfcdata"
> options="acl"/>
>                         <nfsexport name="rfcdata-exp"/>
>                         <nfsclient name="rfcdata-client"
> options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
>                 </resources>
>                 <service autostart="1" domain="nfs-failover"
> name="nfs">
>                         <clusterfs ref="fs-shared">
>                                 <nfsexport ref="fs-shared-exp">
>                                         <nfsclient
> ref="fs-shared-client"/>
>                                 </nfsexport>
>                         </clusterfs>
>                         <ip ref="140.90.91.244"/>
>                         <clusterfs ref="rfcdata">
>                                 <nfsexport ref="rfcdata-exp">
>                                         <nfsclient
> ref="rfcdata-client"/>
>                                 </nfsexport>
>                                 <ip ref="140.90.91.244"/>
>                         </clusterfs>
>                 </service>
>         </rm>
>         <fencedevices>
>                 <fencedevice agent="fence_apc" ipaddr="192.168.42.30"
> login="rbrown" name="nfspower" passwd="XXXXXXX"/>
>         </fencedevices>
> </cluster>
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster