[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

large ext3 filesystem consistantly locking itself read-only




We have several large ext3 file system partitions. One of them sets itself to read-only after getting journel problems. I understand that's a good thing, but obviously I need to correct the problem so that it will stop locking itself. Here are some details;

OS is Redhat EL4 x86_64 running on a SunFire v40z, kernel is 2.6.9-42.0.2.ELsmp. The disk storage in question is external, via fiber cable. The fiber HBA is a Qlogic ISP2312 connected to a Qlogic San Switch connected to four Apple Xserve Raids. There are 8 individual LUN's coming from the four XRaids, they appear on the host as /dev/sd[cdefghij]. Those LUNs are put into two LVM volume groups and then mounted from logical volumes.

The partition in question is 8TB, about 92% full at the moment. One oddity about this partition is it has a subdirectory which contains over 2700 symbolic links to other partitions. Here is the output from /var/adm/messages the last time the file system locked itself;

Jul 17 09:01:06  kernel: Info fld=0x0, Current sdd: sense key No Sense
Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_free_blocks_sb: bit already cleared for block 786856796
Jul 17 09:01:06  kernel: Aborting journal on device dm-3.
Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3) in start_transaction: Readonly filesystem
Jul 17 09:01:06  kernel: Aborting journal on device dm-3.
Jul 17 09:01:06  kernel: ext3_abort called.
Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_journal_start_sb: Detected aborted journal
Jul 17 09:01:06  kernel: Remounting filesystem read-only
Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3) in start_transaction: Journal has aborted Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_free_blocks_sb: bit already cleared for block 786856797 Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_free_blocks_sb: bit already cleared for block 786856798 Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_free_blocks_sb: bit already cleared for block 786856799 Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3): ext3_free_blocks_sb: bit already cleared for block 786856800 Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted Jul 17 09:01:06 kernel: EXT3-fs error (device dm-3) in ext3_truncate: Journal has aborted Jul 17 09:01:07 kernel: EXT3-fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted Jul 17 09:01:07 kernel: EXT3-fs error (device dm-3) in ext3_orphan_del: Journal has aborted Jul 17 09:01:07 kernel: EXT3-fs error (device dm-3) in ext3_reserve_inode_write: Journal has aborted Jul 17 09:01:07 kernel: EXT3-fs error (device dm-3) in ext3_delete_inode: Journal has aborted Jul 17 09:01:07 kernel: __journal_remove_journal_head: freeing b_committed_data

If I run fsck it does seem to repair bad blocks and clears inodes but of course for 8TB it takes a long time to run and the corruption only comes back later.

I have considered upgrading the kernel, it could be done. I think part of the problem is the large number of symbolic links on that partition but without evidence it will be difficult to get people to change it. I also don't like the first line in the messages about device sdd getting a "No Sense" response to a SCSI sense key request.

Any good advice on how to proceed would be appreciated. I have looked at the dumpe2fs and debugfs tools but I don't see how to put them to good use in this case.

  Thomas Walker


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]