[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Ext3 Journal corruption on hitachi deskstars



On Wed, Feb 09, 2005 at 01:17:48AM +0100, Christian wrote:
> 
> maybe you can elaborate a bit more on the "corrupted journals": what does
> "fsck" say, what's in the kernel log (during mount). if we know the
> symptoms, perhaps someone can find the root of the problem...
> 

I'm seeing the same behavior, but after only a few hours under heavy
load and also with two new Hitachi SATA drives, showing as sda and sdb. 
System is Fedora Core 3 running 2.6.10-1.770_FC3.  I had to use the
"irqpoll" kernel option to not lock hard when the sata driver loads. 

>From /var/log/dmesg:

SCSI subsystem initialized
libata version 1.10 loaded.
sata_sil version 0.8
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
PCI: setting IRQ 11 as level-triggered
ACPI: PCI interrupt 0000:00:11.0[A] -> GSI 11 (level, low) -> IRQ 11
ata1: SATA max UDMA/100 cmd 0xE083A080 ctl 0xE083A08A bmdma 0xE083A000 
irq 11
ata2: SATA max UDMA/100 cmd 0xE083A0C0 ctl 0xE083A0CA bmdma 0xE083A008 
irq 11
irq 11: nobody cared (try booting with the "irqpoll" option.
 [<c013e0a0>] __report_bad_irq+0x2b/0x68
 [<c013e169>] note_interrupt+0x73/0x96
 [<c013d6cc>] __do_IRQ+0x1bd/0x249
 [<c0104e04>] do_IRQ+0x5e/0x7a
 =======================
 [<c01035b2>] common_interrupt+0x1a/0x20
 [<c0120b50>] __do_softirq+0x2c/0x79
 [<c0104edc>] do_softirq+0x38/0x3f
 =======================
 [<c0104e16>] do_IRQ+0x70/0x7a
 [<c01035b2>] common_interrupt+0x1a/0x20
 [<c020a182>] acpi_processor_idle+0xf1/0x1f6
 [<c010108f>] cpu_idle+0x1f/0x34
 [<c03a5665>] start_kernel+0x16b/0x16d
handlers:
[<e08a1be7>] (ata_interrupt+0x0/0x210 [libata])
Disabling IRQ #11
ata1: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e8 86:3c02 87:4023 
88:203f
ata1: dev 0 ATA, max UDMA/100, 488397168 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:74eb 83:7fea 84:4023 85:74e8 86:3c02 87:4023 
88:203f
ata2: dev 0 ATA, max UDMA/100, 488397168 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
  Vendor: ATA       Model: HDS722525VLSA80   Rev: V36O
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: HDS722525VLSA80   Rev: V36O
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
 sda: sda1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0



In the past 6 hours, I've recorded the following (grepped from dmesg
with -i ext3):
EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system zone - block = 2588673
EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sdb1) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
EXT3-fs error (device sdb1) in start_transaction: Journal has aborted
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on sdb1, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with journal data mode.
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139
EXT3-fs error (device sda1) in start_transaction: Journal has aborted
EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139
EXT3-fs error (device sda1): ext3_readdir: bad entry in directory #7618561: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=58182, name_len=139
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on hdg1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on hdh1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 15261701
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted
EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted
EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda1) in ext3_truncate: Journal has aborted
EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda1) in ext3_orphan_del: Journal has aborted
EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sda1) in ext3_delete_inode: Journal has aborted
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on hdg1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on hdh1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3-fs error (device sdb1): ext3_add_entry: bad entry in directory #1982465: rec_len % 4 != 0 - offset=0, inode=1179011410, rec_len=46658, name_len=117
ext3_abort called.
EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
EXT3-fs error (device sdb1) in start_transaction: Journal has aborted
EXT3-fs error (device sdb1) in ext3_create: IO failure
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with journal data mode.
EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system zone - block = 19431424
EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device sdb1) in ext3_prepare_write: Journal has aborted
ext3_abort called.
EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal
EXT3-fs error (device sdb1) in start_transaction: Journal has aborted
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on sdb1, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with journal data mode.
EXT3 FS on hdf1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3 FS on hdg1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3 FS on hdh1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Filesystem error recorded from previous mount: error -87241522
EXT3-fs warning (device sdb1): ext3_clear_journal_err: Marking fs in need of filesystem check.
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
EXT3 FS on sdb1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.


fsck was giving me more output and showing more errors earlier, but now
it is unable to fully repair the FS and every run just reports block
bitmap differences:

root servo:~$ fsck -fy /dev/sdb1
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  +(3966976--3966983) +(55412736--55412739) 
+55412743 +(55449602--55449603) +(55449606--55449607)
Fix? yes

/dev/sdb1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb1: 343/30539776 files (2.3% non-contiguous), 25691648/61049000 
blocks
root servo:~$ fsck -fy /dev/sdb1
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  +(3966976--3966983) +(55449600--55449607)
Fix? yes


/dev/sdb1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb1: 343/30539776 files (2.3% non-contiguous), 25691648/61049000 
blocks

Any ideas?

On an unrelated note, is the irqpoll option the cause of this
oft-repeated message?

Mar 19 05:38:58 servo kernel: hdc: cdrom_pc_intr: The drive appears 
confused (ireason = 0x01)

---
    Nitin Dahyabhai <nitind pobox com>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]