[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: is this null block OK?



Peter writes:
> A system running ext3 crashed this afternoon (nothing to do with ext3, bad
> network driver).   Is was saving a file from emacs when it happened.  The
> file system is 0.06b and had ordered data as the mount option.   Let me
> emphasize this was running ext3 pure, not with SnapFS or InterMezzo layered
> on top of it.

Strangely, I had the same problem last week (with 0.0.6b + ordered data
mode).  My machine was going into a tight loop due to some LVM issues,
and I was editing (with VIM) some kernel files and recompiling between
crashes (note filesystem in question was NOT on LVM).  After each hang,
I would reboot, edit and recompile - the compilation would often fail with
a 4k block of NULs in the source file.  Early on I also ran ctags -R on
the kernel tree, and this produced a whole lot of NUL or garbage-filled
blocks in the tags file.

At the time, I attributed it to LVM corrupting memory or something, but
with your report, I'm not so sure.  The "holes" were always 4k blocks in
the middle of the file, and _usually_ NUL filled, but in the "tags" file
they were also filled with binary garbage (not totally random either -
at one point it looked like a numeric sequence like 0x0003??01, 0x0003??02,
0x0003??03, etc).

VIM appears to do the same thing as emacs (after creating a backup):

open("lvm.c", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 5

> This seems somewhat suspicious, but it's likely that I'm not understanding
> everything about ext3.

There was an older bug where ordered data mode was treated like writeback
data mode, and that might lead to this sort of bug (for new files only).
Stephen will correct me if I'm wrong, but ordered data mode should mean that
the inode metadata should only be updated _after_ the inode data blocks are
written.

Given the sequence of events (ordered data mode):

   file has blocks X, Y, Z
   the file is truncated
   upon re-writing the file, it is allocated blocks X, M, Z
   data blocks are written to disk while transaction is committing
   crash before transaction is complete

One of several ways this could corrupt the file data:
- Block Y and/or M was overwritten via bad journal recovery (shouldn't happen)
- Block Y was overwritten by metadata (I don't _think_ this will happen).
  If the truncate and allocation are in separate transactions, we should
  get a short file (didn't check end of file).
- Block N was not written to disk (X and Z may have made it).  In this case,
  the whole transaction must have failed, and we would see the old blocks
  in the inode.  Which would mean:
- Block Y was overwritten by data from another file (by the compiler?).
  Should this mean that data blocks in the block bitmap should not be freed
  until the transaction is committed (i.e. all data safely on disk), so
  that we can't overwrite them during a single transaction?  Would this
  mean that data is migrating around the disk a lot?  Does it make a
  difference?  Maybe when the disk is very full.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]