ext3 file system becoming read only

Mon Oct 1 13:18:15 UTC 2007

Thanks Jordi,

Yes,  we are checking everything, then only we will proceed for update the
kernel.

Thanks again

--- Jordi Prats <jprats at cesca.es> wrote:

> Hi Swapana,
> A update is always a good idea. On RHEL updates use to go smoothly, but 
> I have you checked your FC switch for errors on each port? You could 
> also check your SAN controllers, or run some diagnostics to be sure it's 
> not a problem on your SAN. If your active controller reboots suddenly it 
> can cause some IO errors causing your journal corruption.
> 
> regards,
> Jordi
> 
> 
> 
> Swapana Ghosh wrote:
> > Hi,
> >
> > As I explained in my first posting that the 'read-only' issue is not for
> one
> > server, it is happening for few servers which are generally 'oracle'
> database
> > oriented. Very recently it happned to an 'oracle' application server. For
> > temporary basis , we are re-mounting the file system and also doing fsck.  
> 
> > While searching the redhat knowledge base, found the following url, the
> problem
> > they were explaining it is similar to our issues, 
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=213921
> >
> > It is telling that it is the bug of the kernel..
> >
> > Not sure whether we will proceed for the higher version of kernel or not,
> > please advice.
> >
> > Thanks
> >
> >
> > --- tweeks <tweeks at rackspace.com> wrote:
> >
> >   
> >> The EL4 kernel is wacky when it comes the the I/O scheduler locking up and
> >> and 
> >> causing ext3 to remount RO.  Various hardware hiccups can cause it to go
> RO. 
> >>
> >> And when it does.. you need to tread lightly or you could lose everything.
> >>
> >> If your ext3 filesystem had problems and remounted read-only, I would
> >> strongly 
> >> advise /against/ simply fscking it.  Often times when your filesystem has 
> >> gone RO, it may have been that way for 30 minutes or more.  Just rebooting
> ro
> >>
> >> fscking is a great way to lose everything (i.e. everything being dumped 
> >> into /lost+found/"
> >>
> >> Instead, I would recommend:
> >> 1) rebooting into a rescue CD environment (not allowing the rescue
> >> environment 
> >> to mount or fsck your filesystems).
> >> 2) Nuke the ext3 journal:
> >> 	tune2fs -O ^has_journal /dev/<rootfs>
> >>  (possibly doing the same for other problem partitions)
> >> 3) Do a fake fsck to see the extent of damage:
> >> 	fsck -fn /dev/<rootfs>
> >>   (after checking things out.. use "-fy" once you're sure that it's safe)
> >> 4) Rebuild the journal w, "tune2fs -j /dev/<rootfs>
> >>   (rerun at least once until "clean" result is repeatable)
> >> 5) Mount and check things out, 
> >> 	"mkdir /mnt/tmp && mount -t ext3 /dev/<rootfs> /mnt/tmp"
> >> 6) Gracefully umount & reboot:
> >> 	"umount /mnt/tmp  && shutdown -rf now && exit"
> >>
> >> Tweeks
> >>
> >> On Tuesday 25 September 2007 11:47, Swapana Ghosh wrote:
> >>     
> >>> Hi Jordi,
> >>>
> >>> Thanks for your reply.  I will test the way you suggested.
> >>>
> >>> Thanks
> >>> -swapna
> >>>
> >>> --- Jordi Prats <jprats at cesca.es> wrote:
> >>>       
> >>>> Hi,
> >>>> It seems like what it happened to me. I did this to solve this issue:
> >>>>
> >>>> Mark the filesystem as it does not have a journal (take it to ext2)
> >>>>
> >>>> tune2fs -O ^has_journal /dev/cciss/c0d0p2
> >>>>
> >>>> fsck it to delete the journal:
> >>>>
> >>>> e2fsck /dev/cciss/c0d0p2
> >>>>
> >>>> Create the journal (take it back to ext3)
> >>>>
> >>>> tune2fs -j /dev/cciss/c0d0p2
> >>>>
> >>>> and finaly, remount it.
> >>>>
> >>>> In my case it was with a local disk, but with your SAN disk should be
> >>>> the same.
> >>>>
> >>>> Jordi
> >>>>
> >>>> Swapana Ghosh wrote:
> >>>>         
> >>>>> Hi
> >>>>>
> >>>>> In our office environment few servers mostly  database servers and
> >>>>>           
> >>>> yesterday it
> >>>>
> >>>>         
> >>>>> happened
> >>>>> for one application server(first time) the partion is getting "read
> >>>>> only".
> >>>>>
> >>>>> I was checking the archives, found may be similar kind of issues in the
> >>>>> 2007-July archives.
> >>>>> But how it has been solved if someone describes me that will be really
> >>>>>           
> >>>> helpful.
> >>>>
> >>>>         
> >>>>> In our case, just at the problem started found the line in log file as
> >>>>>           
> >>>> follows:
> >>>>         
> >>>>>      EXT3-fs error (device dm-12): edxt3_find_entry: reading directory
> >>>>>           
> >>>> #2015496
> >>>>
> >>>>         
> >>>>> offset 2
> >>>>>
> >>>>> Then one blank line
> >>>>> Then the line is
> >>>>>
> >>>>>     Aborting journal on device dm-12.
> >>>>>     ext3_abort called
> >>>>>
> >>>>>     Ext3-fs error (device dm-12): ext3_journal_start_sb: Detected
> >>>>> aborted journal
> >>>>>     Remounting filesysem read-only
> >>>>>
> >>>>> Then the continuous line as follows:
> >>>>>
> >>>>>
> >>>>>     EXT3-fs error (device dm-12) in start_transaction: Journal has
> >>>>> aborted
> >>>>>
> >>>>>
> >>>>>
> >>>>> The above message is continuous  until we remount the filesystem and
> >>>>>           
> >>>> partion
> >>>>
> >>>>         
> >>>>> becomes
> >>>>> 'read-write'.
> >>>>>
> >>>>> We could not figure it out what is the root cause of the system.
> >>>>>
> >>>>> We are using individual EMC luns and are configured with LVM volume
> >>>>> groups
> >>>>>           
> >>>> and
> >>>>
> >>>>         
> >>>>> then mounted on logical
> >>>>> volumes.
> >>>>>
> >>>>> Here i am giving the server description:
> >>>>>
> >>>>> ____________________________________________________________
> >>>>>
> >>>>> [root at server ~]# lsmod |grep -i qla
> >>>>> qla2300               130304  0
> >>>>> qla2xxx_conf          305924  0
> >>>>> qla2xxx               307448  21 qla2300
> >>>>> scsi_mod              117709  5 sg,emcp,qla2xxx,cciss,sd_mod
> >>>>>
> >>>>> ____________________________________________________________
> >>>>> [root at server ~]# cat /etc/modprobe.conf
> >>>>> alias eth0 tg3
> >>>>> alias eth1 tg3
> >>>>> alias eth2 e1000
> >>>>> alias eth3 e1000
> >>>>> alias eth4 e1000
> >>>>> alias eth5 e1000
> >>>>> alias bond0 bonding
> >>>>> alias scsi_hostadapter cciss
> >>>>> options bond0 max_bonds=2 miimon=100 mode=1
> >>>>> alias scsi_hostadapter1 qla2xxx
> >>>>> alias scsi_hostadapter2 qla2xxx_conf
> >>>>> #alias scsi_hostadapter3 qla6312
> >>>>> options qla2xxx  ql2xmaxqdepth=16 qlport_down_retry=64
> >>>>> ql2xloginretrycount=30 ql2xfailover=0 ql2xlbType=0
> 
=== message truncated ===

____________________________________________________________________________________
Got a little couch potato? 
Check out fun summer activities for kids.
http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz