[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: LVM 0.9.1beta7 and ext3 0.0.6b



Hi again,

On Sat, 5 May 2001, Andreas Dilger wrote:

> Jay writes:
> > I've recently been playing about with recent ext3 0.0.6b and lvm 0.9.1
> > beta7 and am now able to trigger an "Attempt to refile free buffer"
> > assertion.
> >
> > This seems to "only" occur when using ext3 on the root filesystem.
> > Possibly that is related to the fact that the lvm utility I'm using to
> > reproduce this problem is modifying data in /etc.
>
> Yes, I had this same problem with LVM 0.9.1b7 and ext3 0.0.6b.
>
> > The easist reproduction case I've come across to generate the assertion is
> > to load up lvm (insmod if necessary) and then run:
> >
> > 	while /bin/true; do vgscan -v; done
>
> The same is true even if you only do pvscan (this has no chance to blow
> up your LVM configuration).  The reason is because of LVM calling
> invalidate_buffers on all of the devices (I believe), but I haven't tracked
> down all of the reasons it is happening.  In __invalidate_buffers, Stephen
> asked to add in "&& bh->b_jlist == BJ_None" to the checks for put_last_free(),
> but this only reduced the assertions and did not remove them entirely.

Right, I've noticed the "blow away data" bit when experimenting with this
using vgscan, if I perse, setup a while loop to vgscan and another loop to
fsck an lvm device, watch the fireworks fly with i/o errors on the device
as vgscan removes/add device node entries, etc. ;)

> > Again, it doesn't seem to generate the problem when using ext2 on the root
> > filesystem even if I have ext3 in use on seperate filesystems.  Also, you
> > do not need to have an LVM device actively mounted to generate this.  In
> > my case I have no active lvm devices up and running, just lvm-mod
> > insmoded.
>
> This is more than what I figured out.  Initially, I thought it had to do
> with the LVM devices themselves (on which I was running ext3), but after
> putting in debugging I also see that the buffers belong to the root device.
> In my case, I have data journaling on root.  Is this the case for you?

Using default "ordered" journaling mode on my root and /opt devices.

I retried my test cases using Stephen's suggested patch to your prior
query on this issue.  As you've stated, it seems to survive a bit longer,
but the problem does still surface.  Again, it appears to be hanging on my
root device and is not hanging the LVM device (it's not even mounted or
being used, other than the vgscan queries).

I've also tried my test case using the patch you put together that sounds
as though it is/was working for you.  Is that the case?  It died here
pretty quickly using the two vgscan loops.  It failed when I started
rm'ing a large kernel tree from my /opt filesystem.  /opt is the guy that
hung and the assertion is different, not the "refile buffer" as prior.

Message from syslogd slippey at Mon May  7 06:34:11 2001 ...
slippey kernel: Assertion failure in journal_write_metadata_buffer() at
journal.c line 325: "buffer_jdirty(bh_in)".





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]