[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: LVM 0.9.1beta7 and ext3 0.0.6b



I'm still digging into this one.  Here's what I've been able to discover.
With Stephen's suggested patch applied to __invalidate_buffers() we still
see the "refile free buffer" at times.

What I'm seeing in kdb output is that the buffer head that is failing in
the refile_buffer() test has a b_dev == B_FREE and b_list == BUF_LOCKED.
All the other bits appear clear (in the kdbm_jfs bh command output).

So, not really knowing what all I'm doing in this case I started adding
some hunks of code to __invalidate_buffers.  It currently looks something
like the following:

                        if (!bh->b_count && bh->b_jlist == BJ_None &&
                            (destroy_dirty_buffers || !buffer_dirty(bh)))
{
                                if (bh->b_list == BUF_LOCKED) {
                                        printk("This buffer is locked and
probably shouldn't be here?\n");
                                        goto eep;
                                }
                                put_last_free(bh);
                        }
eep:
                        if (slept)
                                goto again;

Then I ran my box for a while without messing with any of the lvm stuff
and noticed that I never hit that printk statement I've added.  After I
had a little controlled experiment there, I got a little crazy and started
messing with the lvm commands.  Basically any of the commands that seem
todo some form of writing to the LVM config or device or such (haven't
figured out what it is yet) generate the printk statements.  Andreas, FYI,
pvscan generates far fewer of them than vgscan does.  Lvscan doesn't seem
to generate any.  Using the actual LVM device (including snapshotting,
etc)  doesn't generate any of the printk messages.  Only a subset of the
lvm userland tools seem to generate the message.

It's been running with the above code hacked into it for a bit now without
generating any *other* assertions.  In all honesty I'm a bit uncertain as
to where the BUF_LOCKED bit comes into play and could use a little
background if anybody would like to offer it up.  I need to do some more
research (digging through fs code) to get a better feel for what it's
doing.

Wanted to update you with some more insight on the issue from this end
though.

Enjoy.

On Mon, 7 May 2001, Jay Weber wrote:

> Hi again,
>
> On Sat, 5 May 2001, Andreas Dilger wrote:
>
> > Jay writes:
> > > I've recently been playing about with recent ext3 0.0.6b and lvm 0.9.1
> > > beta7 and am now able to trigger an "Attempt to refile free buffer"
> > > assertion.
> > >
> > > This seems to "only" occur when using ext3 on the root filesystem.
> > > Possibly that is related to the fact that the lvm utility I'm using to
> > > reproduce this problem is modifying data in /etc.
> >
> > Yes, I had this same problem with LVM 0.9.1b7 and ext3 0.0.6b.
> >
> > > The easist reproduction case I've come across to generate the assertion is
> > > to load up lvm (insmod if necessary) and then run:
> > >
> > > 	while /bin/true; do vgscan -v; done
> >
> > The same is true even if you only do pvscan (this has no chance to blow
> > up your LVM configuration).  The reason is because of LVM calling
> > invalidate_buffers on all of the devices (I believe), but I haven't tracked
> > down all of the reasons it is happening.  In __invalidate_buffers, Stephen
> > asked to add in "&& bh->b_jlist == BJ_None" to the checks for put_last_free(),
> > but this only reduced the assertions and did not remove them entirely.
>
> Right, I've noticed the "blow away data" bit when experimenting with this
> using vgscan, if I perse, setup a while loop to vgscan and another loop to
> fsck an lvm device, watch the fireworks fly with i/o errors on the device
> as vgscan removes/add device node entries, etc. ;)
>
> > > Again, it doesn't seem to generate the problem when using ext2 on the root
> > > filesystem even if I have ext3 in use on seperate filesystems.  Also, you
> > > do not need to have an LVM device actively mounted to generate this.  In
> > > my case I have no active lvm devices up and running, just lvm-mod
> > > insmoded.
> >
> > This is more than what I figured out.  Initially, I thought it had to do
> > with the LVM devices themselves (on which I was running ext3), but after
> > putting in debugging I also see that the buffers belong to the root device.
> > In my case, I have data journaling on root.  Is this the case for you?
>
> Using default "ordered" journaling mode on my root and /opt devices.
>
> I retried my test cases using Stephen's suggested patch to your prior
> query on this issue.  As you've stated, it seems to survive a bit longer,
> but the problem does still surface.  Again, it appears to be hanging on my
> root device and is not hanging the LVM device (it's not even mounted or
> being used, other than the vgscan queries).
>
> I've also tried my test case using the patch you put together that sounds
> as though it is/was working for you.  Is that the case?  It died here
> pretty quickly using the two vgscan loops.  It failed when I started
> rm'ing a large kernel tree from my /opt filesystem.  /opt is the guy that
> hung and the assertion is different, not the "refile buffer" as prior.
>
> Message from syslogd slippey at Mon May  7 06:34:11 2001 ...
> slippey kernel: Assertion failure in journal_write_metadata_buffer() at
> journal.c line 325: "buffer_jdirty(bh_in)".
>
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users redhat com
> https://listman.redhat.com/mailman/listinfo/ext3-users
>





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]