[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[Linux-cluster] GFS problem
- From: "Claudio Tassini" <claudio tassini gmail com>
- To: linux-cluster redhat com
- Subject: [Linux-cluster] GFS problem
- Date: Fri, 24 Nov 2006 12:15:44 +0100
Hi all,
I have a two-nodes cluster. Everytime I shutdown one of the cluster nodes, the console of the other node prints out these errors:
SCSI error : <1 0 1 1> return code = 0x20000
end_request: I/O error, dev sde, sector 33433224
device-mapper: dm-multipath: Failing path 8:64.
SCSI error : <1 0 1 1> return code = 0x20000
end_request: I/O error, dev sde, sector 33886548
SCSI error : <1 0 1 1> return code = 0x20000
...........
device-mapper: dm-multipath: Failing path 8:112.
SCSI error : <2 0 1 1> return code = 0x20000
end_request: I/O error, dev sdh, sector 342386776
Buffer I/O error on device diapered_dm-3, logical block 85596598
end_request: I/O error, dev sdh, sector 342386780
Buffer I/O error on device diapered_dm-3, logical block 85596599
Buffer I/O error on device diapered_dm-3, logical block 85596600
Buffer I/O error on device diapered_dm-3, logical block 85596601
..........
GFS: fsid=notartel:not-net.0: fatal: I/O error
GFS: fsid=notartel:not-net.0: block = 1712284
GFS: fsid=notartel:not-net.0: function = gfs_dreread
GFS: fsid=notartel:not-net.0: file = /usr/src/build/765946-x86_64/BUILD/gfs-
kernel-2.6.9-58/smp/src/gfs/dio.c, line = 576
GFS: fsid=notartel:not-net.0: time = 1164365987
GFS: fsid=notartel:not-net.0: about to withdraw from the cluster
GFS: fsid=notartel:not-net.0: waiting for outstanding I/O
..........
Buffer I/O error on device diapered_dm-3, logical block 85596582
GFS: fsid=notartel:not-net.0: telling LM to withdraw
lock_dlm: withdraw abandoned memory
GFS: fsid=notartel:not-net.0: withdrawn
GFS: Trying to join cluster "lock_dlm", "notartel:not-it"
GFS: fsid=notartel:not-it.0: Joined cluster. Now mounting FS...
GFS: fsid=notartel:not-it.0: jid=0: Trying to acquire journal lock...
GFS: fsid=notartel:not-it.0: jid=0: Looking at journal...
GFS: fsid=notartel:not-it.0: jid=0: Done
GFS: fsid=notartel:not-it.0: jid=1: Trying to acquire journal lock...
GFS: fsid=notartel:not-it.0: jid=1: Looking at journal...
GFS: fsid=notartel:not-it.0: jid=1: Done
GFS: fsid=notartel:not-it.0: jid=2: Trying to acquire journal lock...
GFS: fsid=notartel:not-it.0: jid=2: Looking at journal...
GFS: fsid=notartel:not-it.0: jid=2: Done
GFS: Trying to join cluster "lock_dlm", "notartel:not-net"
GFS: fsid=notartel:not-net.0: Joined cluster. Now mounting FS...
GFS: fsid=notartel:not-net.0: jid=0: Trying to acquire journal lock...
GFS: fsid=notartel:not-net.0: jid=0: Looking at journal...
GFS: fsid=notartel:not-net.0: jid=0: Acquiring the transaction lock...
GFS: fsid=notartel:not-net.0: jid=0: Replaying journal...
GFS: fsid=notartel:not-net.0
: jid=0: Replayed 533 of 1995 blocks
GFS: fsid=notartel:not-net.0: jid=0: replays = 533, skips = 293, sames = 1169
GFS: fsid=notartel:not-net.0: jid=0: Journal replayed in 1s
GFS: fsid=notartel:not-net.0: jid=0: Done
GFS: fsid=notartel:not-net.0: jid=1: Trying to acquire journal lock...
GFS: fsid=notartel:not-net.0: jid=1: Looking at journal...
GFS: fsid=notartel:not-net.0: jid=1: Done
GFS: fsid=notartel:not-net.0: jid=2: Trying to acquire journal lock...
GFS: fsid=notartel:not-net.0: jid=2: Looking at journal...
GFS: fsid=notartel:not-net.0: jid=2: Done
GFS: fsid=notartel:not-net.0: Scanning for log elements...
GFS: fsid=notartel:not-net.0: Found 6 unlinked inodes
GFS: fsid=notartel:not-net.0: Found quota changes for 2 IDs
GFS: fsid=notartel:not-net.0: Done
and services on that node do a restart.
The topology is as follows:
2 SunFire X4200 Servers, each equipped with 2 Qlogic (Sun) HBAs which lspci show as:
Fibre Channel: QLogic Corp. QLA6322 Fibre Channel Adapter (rev 03)
connected via two FC Switches SANBOX2 (always from qlogic) to a Sun StorEdge 3510 RAID Array.
The cluster configuration is made of a mail service which mounts three GFS filesystems, then starts postfix and courier-imap.
It seems that the problem is when the qlogic driver (qla6312) gets loaded-unloaded. I managed to reproduce the problem doing a modprobe -r qla6312 / modprobe qla6312: immediately the other node starts whit scsi errors until GFS filesystems hang and are whitdraw.
Any idea if this can be a GFS fault or only a matter of drivers? and if the latter, which mailing list should I post for it?
Thanks in advance
--
Claudio Tassini
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]