[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Adding new file system caused problems



I am running a two node cluster using Centos 5 that is basically being used as a NAS head for our iscsi based storage.  Here are the related rpms and their versions I am using:
kmod-gfs-0.1.16-5.2.6.18_8.1.14.el5
kmod-gfs-0.1.16-6.2.6.18_8.1.15.el5
system-config-lvm-1.0.22-1.0.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
gfs-utils-0.1.11-3.el5
lvm2-2.02.16-3.el5
lvm2-cluster-2.02.16-3.el5

This morning I created a 100GB volume on our storage unit and proceeded to make it available to the cluster so it could be served via NFS to a client on our network.  I used pvcreate and vgcreate as I always do and created a new volume group.  When I went to create the logical volume I saw this message:
Error locking on node nfs1-cluster.nws.noaa.gov: Volume group for uuid not found: 9crOQoM3V0fcuZ1E2163k9vdRLK7njfvnIIMTLPGreuvGmdB1aqx6KR4t7mmDRDs

I figured I had done something wrong and tried to remove the Lvol and couldn't.  Lvdisplay showed that the logvol had been created and vgdisplay looked good with the exception of the volume not being activated.  So, I ran vgchange -aly <Volumegroupname> which didn't return any error, but also did not activate the volume.  I then rebooted the node which made everything OK.  I could now see the VG and lvol, both were active and I could now create the gfs file system on the lvol.  The file system mounted  and I thought I was in the clear.

However, node #2 wasn't picking this new filesystem up at all.  I stopped the cluster services on this node which all stopped cleanly and then tried to restart them.  cman started fine but clvmd didn't.  It hung on the vgscan.   Even after a reboot of node #2, clvmd would not start and would hang on the vgscan.  It wasn't until I shut down both nodes completely and started cluster that both nodes could see the new filesystem.

I'm sure it's my own ignorance that's making this more difficult than it needs to be.  Am I missing a step?  Is more information required to help?  Any assistance in figuring out what happened here would be greatly appreciated.  I know I going to need to do similar tasks in the future and obviously can't afford to bring everything down in order for the cluster to see a new filesystem.

Thank you,

Randy

P.S.  Here is my cluster.conf:
[root nfs2-cluster ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="ohd_cluster" config_version="114" name="ohd_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="60"/>
        <clusternodes>
                <clusternode name="nfs1-cluster.nws.noaa.gov" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="8" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="nfs2-cluster.nws.noaa.gov" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="nfspower" port="7" switch="1"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <rm>
                <failoverdomains>
                        <failoverdomain name="nfs-failover" ordered="0" restricted="1">
                                <failoverdomainnode name="nfs1-cluster.nws.noaa.gov" priority="1"/>
                                <failoverdomainnode name="nfs2-cluster.nws.noaa.gov" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="140.90.91.244" monitor_link="1"/>
                        <clusterfs device="/dev/VolGroupFS/LogVol-shared" force_unmount="0" fsid="30647" fstype="gfs" mountpoint="/fs/shared" name="fs-shared" options="acl"/>
                        <nfsexport name="fs-shared-exp"/>
                        <nfsclient name="fs-shared-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                        <clusterfs device="/dev/VolGroupTemp/LogVol-rfcdata" force_unmount="0" fsid="54233" fstype="gfs" mountpoint="/rfcdata" name="rfcdata" options="acl"/>
                        <nfsexport name="rfcdata-exp"/>
                        <nfsclient name="rfcdata-client" options="no_root_squash,rw" path="" target="140.90.91.0/24"/>
                </resources>
                <service autostart="1" domain="nfs-failover" name="nfs">
                        <clusterfs ref="fs-shared">
                                <nfsexport ref="fs-shared-exp">
                                        <nfsclient ref="fs-shared-client"/>
                                </nfsexport>
                        </clusterfs>
                        <ip ref="140.90.91.244"/>
                        <clusterfs ref="rfcdata">
                                <nfsexport ref="rfcdata-exp">
                                        <nfsclient ref="rfcdata-client"/>
                                </nfsexport>
                                <ip ref="140.90.91.244"/>
                        </clusterfs>
                </service>
        </rm>
        <fencedevices>
                <fencedevice agent="fence_apc" ipaddr="192.168.42.30" login="rbrown" name="nfspower" passwd="XXXXXXX"/>
        </fencedevices>
</cluster>
begin:vcard
fn:Randy Brown
n:Brown;Randy
org:National Weather Service;Office of hydrologic Development
adr:;;1325 East West Highway;Silver Spring;MD;20910;USA
email;internet:randy brown noaa gov
title:Senior Systems Administrator
tel;work:301-713-1669 x110
url:http://www.nws.noaa.gov/ohd/
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]