ext3 file system becoming read only
Swapana Ghosh
swapana_ghosh at yahoo.com
Mon Oct 1 13:18:15 UTC 2007
Thanks Jordi,
Yes, we are checking everything, then only we will proceed for update the
kernel.
Thanks again
--- Jordi Prats <jprats at cesca.es> wrote:
> Hi Swapana,
> A update is always a good idea. On RHEL updates use to go smoothly, but
> I have you checked your FC switch for errors on each port? You could
> also check your SAN controllers, or run some diagnostics to be sure it's
> not a problem on your SAN. If your active controller reboots suddenly it
> can cause some IO errors causing your journal corruption.
>
> regards,
> Jordi
>
>
>
> Swapana Ghosh wrote:
> > Hi,
> >
> > As I explained in my first posting that the 'read-only' issue is not for
> one
> > server, it is happening for few servers which are generally 'oracle'
> database
> > oriented. Very recently it happned to an 'oracle' application server. For
> > temporary basis , we are re-mounting the file system and also doing fsck.
>
> > While searching the redhat knowledge base, found the following url, the
> problem
> > they were explaining it is similar to our issues,
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=213921
> >
> > It is telling that it is the bug of the kernel..
> >
> > Not sure whether we will proceed for the higher version of kernel or not,
> > please advice.
> >
> > Thanks
> >
> >
> > --- tweeks <tweeks at rackspace.com> wrote:
> >
> >
> >> The EL4 kernel is wacky when it comes the the I/O scheduler locking up and
> >> and
> >> causing ext3 to remount RO. Various hardware hiccups can cause it to go
> RO.
> >>
> >> And when it does.. you need to tread lightly or you could lose everything.
> >>
> >> If your ext3 filesystem had problems and remounted read-only, I would
> >> strongly
> >> advise /against/ simply fscking it. Often times when your filesystem has
> >> gone RO, it may have been that way for 30 minutes or more. Just rebooting
> ro
> >>
> >> fscking is a great way to lose everything (i.e. everything being dumped
> >> into /lost+found/"
> >>
> >> Instead, I would recommend:
> >> 1) rebooting into a rescue CD environment (not allowing the rescue
> >> environment
> >> to mount or fsck your filesystems).
> >> 2) Nuke the ext3 journal:
> >> tune2fs -O ^has_journal /dev/<rootfs>
> >> (possibly doing the same for other problem partitions)
> >> 3) Do a fake fsck to see the extent of damage:
> >> fsck -fn /dev/<rootfs>
> >> (after checking things out.. use "-fy" once you're sure that it's safe)
> >> 4) Rebuild the journal w, "tune2fs -j /dev/<rootfs>
> >> (rerun at least once until "clean" result is repeatable)
> >> 5) Mount and check things out,
> >> "mkdir /mnt/tmp && mount -t ext3 /dev/<rootfs> /mnt/tmp"
> >> 6) Gracefully umount & reboot:
> >> "umount /mnt/tmp && shutdown -rf now && exit"
> >>
> >> Tweeks
> >>
> >> On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote:
> >>
> >>> Hi Jordi,
> >>>
> >>> Thanks for your reply. I will test the way you suggested.
> >>>
> >>> Thanks
> >>> -swapna
> >>>
> >>> --- Jordi Prats <jprats at cesca.es> wrote:
> >>>
> >>>> Hi,
> >>>> It seems like what it happened to me. I did this to solve this issue:
> >>>>
> >>>> Mark the filesystem as it does not have a journal (take it to ext2)
> >>>>
> >>>> tune2fs -O ^has_journal /dev/cciss/c0d0p2
> >>>>
> >>>> fsck it to delete the journal:
> >>>>
> >>>> e2fsck /dev/cciss/c0d0p2
> >>>>
> >>>> Create the journal (take it back to ext3)
> >>>>
> >>>> tune2fs -j /dev/cciss/c0d0p2
> >>>>
> >>>> and finaly, remount it.
> >>>>
> >>>> In my case it was with a local disk, but with your SAN disk should be
> >>>> the same.
> >>>>
> >>>> Jordi
> >>>>
> >>>> Swapana Ghosh wrote:
> >>>>
> >>>>> Hi
> >>>>>
> >>>>> In our office environment few servers mostly database servers and
> >>>>>
> >>>> yesterday it
> >>>>
> >>>>
> >>>>> happened
> >>>>> for one application server(first time) the partion is getting "read
> >>>>> only".
> >>>>>
> >>>>> I was checking the archives, found may be similar kind of issues in the
> >>>>> 2007-July archives.
> >>>>> But how it has been solved if someone describes me that will be really
> >>>>>
> >>>> helpful.
> >>>>
> >>>>
> >>>>> In our case, just at the problem started found the line in log file as
> >>>>>
> >>>> follows:
> >>>>
> >>>>> EXT3-fs error (device dm-12): edxt3_find_entry: reading directory
> >>>>>
> >>>> #2015496
> >>>>
> >>>>
> >>>>> offset 2
> >>>>>
> >>>>> Then one blank line
> >>>>> Then the line is
> >>>>>
> >>>>> Aborting journal on device dm-12.
> >>>>> ext3_abort called
> >>>>>
> >>>>> Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected
> >>>>> aborted journal
> >>>>> Remounting filesysem read-only
> >>>>>
> >>>>> Then the continuous line as follows:
> >>>>>
> >>>>>
> >>>>> EXT3-fs error (device dm-12) in start_transaction: Journal has
> >>>>> aborted
> >>>>>
> >>>>>
> >>>>>
> >>>>> The above message is continuous until we remount the filesystem and
> >>>>>
> >>>> partion
> >>>>
> >>>>
> >>>>> becomes
> >>>>> 'read-write'.
> >>>>>
> >>>>> We could not figure it out what is the root cause of the system.
> >>>>>
> >>>>> We are using individual EMC luns and are configured with LVM volume
> >>>>> groups
> >>>>>
> >>>> and
> >>>>
> >>>>
> >>>>> then mounted on logical
> >>>>> volumes.
> >>>>>
> >>>>> Here i am giving the server description:
> >>>>>
> >>>>> ____________________________________________________________
> >>>>>
> >>>>> [root at server ~]# lsmod |grep -i qla
> >>>>> qla2300 130304 0
> >>>>> qla2xxx_conf 305924 0
> >>>>> qla2xxx 307448 21 qla2300
> >>>>> scsi_mod 117709 5 sg,emcp,qla2xxx,cciss,sd_mod
> >>>>>
> >>>>> ____________________________________________________________
> >>>>> [root at server ~]# cat /etc/modprobe.conf
> >>>>> alias eth0 tg3
> >>>>> alias eth1 tg3
> >>>>> alias eth2 e1000
> >>>>> alias eth3 e1000
> >>>>> alias eth4 e1000
> >>>>> alias eth5 e1000
> >>>>> alias bond0 bonding
> >>>>> alias scsi_hostadapter cciss
> >>>>> options bond0 max_bonds=2 miimon=100 mode=1
> >>>>> alias scsi_hostadapter1 qla2xxx
> >>>>> alias scsi_hostadapter2 qla2xxx_conf
> >>>>> #alias scsi_hostadapter3 qla6312
> >>>>> options qla2xxx ql2xmaxqdepth=16 qlport_down_retry=64
> >>>>> ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0
>
=== message truncated ===
____________________________________________________________________________________
Got a little couch potato?
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz
More information about the Ext3-users
mailing list