[Linux-cluster] Fwd: GFS volume hangs on 3 nodes after gfs_grow

Alan A alan.zg at gmail.com
Fri Sep 26 15:53:09 UTC 2008


Thanks again, Bob.

No kernel-panic on any of the nodes. I had to cold boot all 3 nodes in order
to get the cluster going (might have been a fence issue but am not 100%
sure, since we use only SCSI fencing until we agree on secondary fencing
method). What is 'scary' is that gfs_grow command paralized that volume on
all 3 nodes, and I coldn't access, nor unmount, nor run gfs_fsck, from any
of the nodes. We will do more testing on this, btw do you have suggested
"safe" method of growing and shrinking the volume other than what is noted
in 5.2 documentation (since we followed the RHEL manual). If the GFS volume
hangs - what is the best way to try and unmount it from the node,  would
'gfs_freeze' helped)?

I checked all nodes for service gfs status, service clvmd status and service
cman status. On node4 service clvmd status hangs on displaying me active
volumes:

>From node3 - service clvmd status:
clvmd (pid 8892) is running...
active volumes: LogVol00 LogVol01 gfs_sda1 gfs_sdb1

Here is the node4 response to service clvmd status:
clvmd (pid 4829) is running... (and it hangs)

On node3 - I didn't get any messages in the dmesg - but I got this in
/var/log/messages:

Sep 26 10:40:17 dev03 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: Unmount
seems to be stalled. Dumping lock state...

Sep 26 10:40:17 dev03 kernel: Glock (5, 22)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (1, 2)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1 2

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (2, 24)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 5

Sep 26 10:40:17 dev03 kernel:   gl_state = 1

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = yes

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = 1

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Inode: busy

Sep 26 10:40:17 dev03 kernel: Glock (2, 22)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1 2

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = 1

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (5, 24)

Sep 26 10:40:17 dev03 kernel:   gl_flags =

Sep 26 10:40:17 dev03 kernel:   gl_count = 2

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = no

Sep 26 10:40:17 dev03 kernel:   req_bh = no

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = yes

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Holder

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 3

Sep 26 10:40:17 dev03 kernel:     gh_flags = 5 7

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 1 6 7

Sep 26 10:40:17 dev03 kernel: Glock (5, 21)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (5, 23)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (1, 1)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (2, 21)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1 2

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = 1

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (2, 25)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = 1

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (2, 23)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = 1

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel: Glock (5, 25)

Sep 26 10:40:17 dev03 kernel:   gl_flags = 1

Sep 26 10:40:17 dev03 kernel:   gl_count = 3

Sep 26 10:40:17 dev03 kernel:   gl_state = 3

Sep 26 10:40:17 dev03 kernel:   req_gh = yes

Sep 26 10:40:17 dev03 kernel:   req_bh = yes

Sep 26 10:40:17 dev03 kernel:   lvb_count = 0

Sep 26 10:40:17 dev03 kernel:   object = no

Sep 26 10:40:17 dev03 kernel:   new_le = no

Sep 26 10:40:17 dev03 kernel:   incore_le = no

Sep 26 10:40:17 dev03 kernel:   reclaim = no

Sep 26 10:40:17 dev03 kernel:   aspace = no

Sep 26 10:40:17 dev03 kernel:   ail_bufs = no

Sep 26 10:40:17 dev03 kernel:   Request

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5

Sep 26 10:40:17 dev03 kernel:   Waiter2

Sep 26 10:40:17 dev03 kernel:     owner = -1

Sep 26 10:40:17 dev03 kernel:     gh_state = 0

Sep 26 10:40:17 dev03 kernel:     gh_flags = 0

Sep 26 10:40:17 dev03 kernel:     error = 0

Sep 26 10:40:17 dev03 kernel:     gh_iflags = 2 4 5


----------------------------------------------------------------------------------------------------------------

Here are /dmesg lines - starting with the node4, gfs_sdb1 is the "bad
volume":

Sep 26 09:11:46 dev04 kernel: Joined cluster. Now mounting FS...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=0:
Trying to acquire journal lock...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=0:
Busy

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=1:
Trying to acquire journal lock...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=1:
Looking at journal...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=1:
Done

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Trying to acquire journal lock...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Looking at journal...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Acquiring the transaction lock...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Replaying journal...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Replayed 0 of 9 blocks

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
replays = 0, skips = 2, sames = 7

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Journal replayed in 1s

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=2:
Done

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: Scanning
for log elements...

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: Found 0
unlinked inodes

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: Found
quota changes for 0 IDs

Sep 26 09:11:46 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: Done

Sep 26 09:11:46 dev04 kernel: ACPI: PCI Interrupt 0000:01:04.2[B] -> GSI 22
(level, low) -> IRQ 233

Sep 26 09:11:47 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=0:
Trying to acquire journal lock...

Sep 26 09:11:47 dev04 gfs_controld[6351]: gfs_sdb1 finish: needs recovery
jid 0 nodeid 1 status 1

Sep 26 09:11:47 dev04 kernel: GFS: fsid=test1_cluster:gfs_sdb1.1: jid=0:
Busy


-------------------









On Fri, Sep 26, 2008 at 10:33 AM, Bob Peterson <rpeterso at redhat.com> wrote:

> ----- "Alan A" <alan.zg at gmail.com> wrote:
> | This is worse than I tought. The entire cluster is hanging upon
> | restart
> | command issued from the Conga - lucy box. I tried bringing gfs service
> | down
> | on node2 (lucy) with the: service gfs stop (we are not running
> | rgmanager),
> | and I got:
> | FATAL: Module gfs is in use.
>
> Hi Alan,
>
> It sounds like conga can't reboot the cluster because the GFS file
> system is still mounted, or is in use.  I don't know much about conga,
> so forgive my ignorance there.  You may need to unmount the gfs file
> system before you reboot.  The dmesg you sent looked perfectly normal
> to me.  Those are normal openais messages.  I'm more interested to
> see if there were any "File system withdrawn" messages, general protection
> faults, or kernel panic messages or other serious kernel errors on any
> of the nodes in the cluster just around the time of the first failure.
>
> This is just a wild guess, but I'm guessing that there was some kind
> of error, like a kernel panic that occurred a while back.  That caused
> the node to be fenced.  Perhaps the SCSI fencing locked up the device
> somehow so none of the nodes can use it.  If that's the case, you
> should be able to log in to each of the nodes, unmount the gfs file
> systems that are mounted, manually, and then reboot them.
> If it doesn't let you unmount them, it might be because some process
> is still using the GFS file system.  For example, if you're using
> NFS to export the GFS file system, you probably need to do
> service nfs stop before it will let you unmount the gfs, then reboot.
>
> So I would comb through the /var/log/messages of each node looking
> for an error message regarding the node being fenced, withdrawn, panic,
> SCSI errors, or any kind of serious errors that occurred around the
> time where you first had the problem.
>
> Regards,
>
> Bob Peterson
> Red Hat Clustering & GFS
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Alan A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080926/a3550730/attachment.htm>


More information about the Linux-cluster mailing list