[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] PVFS going Wild



Hey Guys,

I just took over a couple of clusters for a sysadmin that left the company.  Unfortunately, the hand-off was less than informative.  <sigh>  So, I've got an old linux cluster, still well-used, with a PVFS filesystem mounted at /work.  I'm new to clustering, and I sure as hell don't know much about it, but I've got a sick puppy here.  All points to the PVFS filesystem. 

lsof: WARNING: can't stat() pvfs file system /work
      Output information may be incomplete.


In /var/log/messages:

Oct  3 13:51:34 elvis PAM_pwdb[24431]: (su) session opened for user deb_r by deb(uid=2626)
Oct  3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300
0/pvfs-meta
Oct  3 13:51:49 elvis kernel: (./ll_pvfs.c, 361): ll_pvfs_getmeta failed on downcall for 192.168.1.102:300
0/pvfs-meta/manaa/DFTBNEW
Oct  3 14:16:48 elvis kernel: (./ll_pvfs.c, 409): ll_pvfs_statfs failed on downcall for 192.168.1.102:3000
/pvfs-meta
Oct  3 14:16:elvis kernel: (./inode.c, 321): pvfs_statfs failed

So the

Linux elvis 2.2.19-13.beosmp #1 SMP Tue Aug 21 20:04:44 EDT 2001 i686 unknown

Red Hat Linux release 6.2 (Zoot)

Can't access /work from the master or any nodes,

elvis [49#] ls /work
ls: /work: Too many open files


I ran a script in /usr/bin called pvfs_client_stop.sh - which killed all the pvfs daemons, etc

#!/bin/tcsh

# Phil Carns
# pcarns hubcap clemson edu
#
# This is an example script for how to get Scyld Beowulf cluster nodes
# to mount a PVFS file system.

set PVFSD = "/usr/sbin/pvfsd"
set PVFSMOD = "pvfs"
set PVFS_CLIENT_MOUNT_DIR = "/work"
set MOUNT_PVFS = "/sbin/mount.pvfs"

# unmount the file system locally and on all of the slave nodes
/bin/umount $PVFS_CLIENT_MOUNT_DIR
bpsh -pad /bin/umount $PVFS_CLIENT_MOUNT_DIR

# kill all of the  pvfsd client daemons
/usr/bin/killall pvfsd

# remove the pvfs module on the local and the slave nodes
/sbin/rmmod $PVFSMOD
bpsh -pad /sbin/rmmod $PVFSMOD

Then I ran pvfs_client_start.sh /work, which seemed to work, except it never exited...

#!/bin/tcsh

# Phil Carns
# pcarns hubcap clemson edu
#
# This is an example script for how to get Scyld Beowulf cluster nodes
# to mount a PVFS file system.

set PVFSD = "/usr/sbin/pvfsd"
set PVFSMOD = "pvfs"
set PVFS_CLIENT_MOUNT_DIR = "/work"
set MOUNT_PVFS = "/sbin/mount.pvfs"
set PVFS_META_DIR = `bpctl -M -a`:$1

if $1 == "" then
        echo "usage: pvfs_client_start.sh <meta dir>"
        echo "(Causes every machine in the cluster to mount the PVFS file system)"
        exit -1
endif

# insert the pvfs module on the local and slave nodes
/sbin/modprobe $PVFSMOD
bpsh -pad /sbin/modprobe $PVFSMOD

# start the pvfsd client daemon on the local and slave nodes
$PVFSD
bpsh -pad $PVFSD

# actually mount the file system locally and on all of the slave nodes
$MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR
bpsh -pad $MOUNT_PVFS $PVFS_META_DIR $PVFS_CLIENT_MOUNT_DIR


This seemed to work (well, it restarted daemons and such, but I still can't get into /work and getting resource busy and:

mount.pvfs: Device or resource busy
mount.pvfs: server 192.168.1.102 alive, but mount failed (invalid metadata directory name?)

Comments?  Useful ideas?  A good joke???

dave






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]