[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Problem with dm-multipath, Remounting filesystem read-only after "Detected aborted journal"



Hello,

I found
https://www.redhat.com/archives/dm-devel/2006-April/msg00046.html
And it looks like a similar problem.

I have running full multipath blade rhel4/u3 on san storage only.
failover and multibus multipath config with running oracle io on top
running without problem if we are disable one san path. Kernel detectes
LOOP UP immediate an revory of disabled path running fast.

Suddenly we have "IO Errors" and kernel remounts filesystem read-only on
all RHEL4 Blades inside this bladecenter.
It was not a multipath test!
I found on a RHEL3 Blade in the same bladecenter (with kernel modul
multipath) in /var/log/messages

May  9 14:49:03 rhel3 kernel: scsi(0): RSCN database changed -0x2ce,0x0.
May  9 14:49:03 rhel3 kernel: scsi(0): Waiting for LIP to complete...
May  9 14:49:03 rhel3 kernel: scsi(0): Topology - (F_Port), Host Loop
address 0xffff

at this time where several RHEL4 filesystem are going readonly.

## RHEL4/U3
[root rhel4 ~]# multipath -l
sys001 (360060e8004eb2d000000eb2d00001600)
[size=9 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:0 sda 8:0  [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:0 sdc 8:32 [active][ready]
 
lun001 (360060e8004eb2d000000eb2d00000500)
[size=14 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sdb 8:16 [active][ready]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdd 8:48 [active][ready]


## /var/log/messages
May  9 14:49:03 rhel4 kernel: SCSI error : <0 0 0 1> return code =
0x20000
May  9 14:49:03 rhel4 kernel: end_request: I/O error, dev sdb, sector
19893656
May  9 14:49:03 rhel4 kernel: device-mapper: dm-multipath: Failing path
8:16.
May  9 14:49:03 rhel4 multipathd: 8:16: mark as failed
May  9 14:49:03 rhel4 multipathd: lun001: remaining active paths: 1
May  9 14:49:03 rhel4 kernel: SCSI error : <0 0 0 0> return code =
0x20000
May  9 14:49:03 rhel4 kernel: end_request: I/O error, dev sda, sector
13374414
May  9 14:49:03 rhel4 kernel: device-mapper: dm-multipath: Failing path
8:0.
May  9 14:49:03 rhel4 multipathd: 8:0: mark as failed
May  9 14:49:03 rhel4 multipathd: sys001: remaining active paths: 1
May  9 14:49:09 rhel4 kernel: SCSI error : <1 0 0 1> return code =
0x20000
May  9 14:49:09 rhel4 kernel: end_request: I/O error, dev sdd, sector
15485360
May  9 14:49:09 rhel4 kernel: device-mapper: dm-multipath: Failing path
8:48.
May  9 14:49:09 rhel4 kernel: end_request: I/O error, dev sdd, sector
15485368
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-20, logical
block 137223
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-20
May  9 14:49:09 rhel4 multipathd: 8:48: mark as failed
May  9 14:49:09 rhel4 multipathd: lun001: remaining active paths: 0
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-20, logical
block 137222
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-20
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-21, logical
block 366598
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-21
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-21, logical
block 366599
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-21
May  9 14:49:09 rhel4 kernel: EXT3-fs error (device dm-18):
ext3_get_inode_loc: unable to read inode block - inode=642553,
block=1278104
May  9 14:49:09 rhel4 kernel: Aborting journal on device dm-18.
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-18, logical
block 895
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-18
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-18, logical
block 0
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-18
May  9 14:49:09 rhel4 kernel: EXT3-fs error (device dm-18) in
ext3_reserve_inode_write: IO failure
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-18, logical
block 0
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-18
May  9 14:49:09 rhel4 kernel: EXT3-fs error (device dm-18) in
ext3_dirty_inode: IO failure
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-18, logical
block 0
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-18
May  9 14:49:09 rhel4 kernel: EXT3-fs error (device dm-19):
ext3_find_entry: reading directory #65537 offset 0
May  9 14:49:09 rhel4 kernel:
May  9 14:49:09 rhel4 kernel: Aborting journal on device dm-19.
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-19, logical
block 585
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-19
May  9 14:49:09 rhel4 kernel: Buffer I/O error on device dm-19, logical
block 0
May  9 14:49:09 rhel4 kernel: lost page write due to I/O error on dm-19
May  9 14:49:09 rhel4 kernel: ext3_abort called.
May  9 14:49:09 rhel4 kernel: EXT3-fs error (device dm-19):
ext3_journal_start_sb: Detected aborted journal
May  9 14:49:09 rhel4 kernel: Remounting filesystem read-only

--> Oracle on dm-19 is crashing after remount. After Server reboot
oracle is running fin again.

## Other Blade in the same Bladecenter with RHEL3 with kernelmodul
multipath:

May  9 14:49:03 rhel3 kernel: scsi(0): RSCN database changed -0x2ce,0x0.
May  9 14:49:03 rhel3 kernel: scsi(0): Waiting for LIP to complete...
May  9 14:49:03 rhel3 kernel: scsi(0): Topology - (F_Port), Host Loop
address 0xffff
May  9 14:49:05 rhel3 kernel: scsi(0): RSCN database changed -0x2ce,0x0.
May  9 14:49:05 rhel3 kernel: scsi(0): Waiting for LIP to complete...
May  9 14:49:05 rhel3 kernel: scsi(0): Topology - (F_Port), Host Loop
address 0xffff
May  9 14:49:08 rhel3 kernel: scsi(0): RSCN database changed -0x2ce,0x0.
May  9 14:49:08 rhel3 kernel: scsi(0): Waiting for LIP to complete...
May  9 14:49:08 rhel3 kernel: scsi(0): Topology - (F_Port), Host Loop
address 0xffff
May  9 14:49:08 rhel3 kernel: scsi(1): RSCN database changed -0x2d8,0x0.
May  9 14:49:08 rhel3 kernel: scsi(1): Waiting for LIP to complete...
May  9 14:49:08 rhel3 kernel: scsi(1): Topology - (F_Port), Host Loop
address 0xffff
May  9 14:49:11 rhel3 kernel: scsi(1): RSCN database changed -0x2d8,0x0.
May  9 14:49:11 rhel3 kernel: scsi(1): Waiting for LIP to complete...
May  9 14:49:11 rhel3 kernel: scsi(1): Topology - (F_Port), Host Loop
address 0xffff
May  9 14:49:14 rhel3 kernel: scsi(1): RSCN database changed -0x2d8,0x0.
May  9 14:49:14 rhel3 kernel: scsi(1): Waiting for LIP to complete...
May  9 14:49:14 rhel3 kernel: scsi(1): Topology - (F_Port), Host Loop
address 0xffff


What can we to now ?
Are there a kernelparamter with timeout configuration ?

regards
Thomas


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]