[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: how to counteract slowdown



Neil Brown wrote:
> 
> On Monday November 12, pernzer redhat com wrote:
> > Hello,
> >
> > as far as I understood ext3 will more or less hog a machine when writing
> > away the journal. A customer is having a slowdown every 5 minues for about
> > 30 seconds, the machine becomes more or less unusable.
> >
> > This is an NFS server serving 300 Gigs spread over 2 NFS shares.
> >
> > I'm wondering what would be the best course of action:
> >   a) make the journal bigger?
> >   b) make the journal smaller?
> >   c) switch from ordered to writeback?
> >
> > Can somebody give me a hint?
> 
> I've seen something a lot like this.
> 
> I export with "sync" (because it is the safe thing to do) and with
> "no_wdelay" (because that is nicer to ext3) and mount with
> "data=journal" because that is nicer for sync-writes.

We need to find out more about Patrick's setup.

> Under heavy NFS load, I get pauses of a few seconds every few minutes.
> If I
>    echo 40 0 0 0 60 300 60 0 0 > /proc/sys/vm/bdflush
> 
> the problem goes away.

So how long have you been sitting on this info, you, you,
you Australian, you?

> What I *think* is happening is that the journal fills up before
> bdflush flushs the data to it's rightful home.  When this happens,
> ext3 blocks while it forces this data out to disk so that it can make
> more room in the journal.
> What ext3 *should* do is start pushing data out when the journal gets
> X% full for some value of X like 50 or 75.

Once the current transaction reaches 1/4 of the journal size
we start a commit.   If we then see that there isnt enough
free space in the journal (1/4 plus a bit) we force a checkpoint.
That involves forcing writeback of all checkpointable data.  It's
fairly savage.

Now, that writeback is performed by the caller (presumably
a knfsd thread).  It could take quite some time - it'll
be seekbound.

A wild guess would be that by using a more timely bdflush to
initiate the writeback (as you've done), we're keeping the
journal space available.  So it's bdflush who does all the
waiting-while-we-seek (it doesn't need to wait, but it does,
due to some possibly bogus thinking by Linus..)

So the knfsd threads end up not getting blocked on disk seek
activity.

Could you please explain to me how knfsd threads map onto
clients?  Does a user have their "own" thread on the server?
If a particular thread were blocked for a long time, would
a particular user see this, or would another thread serve their
request?

If a particular knfsd thread were to get blocked for some time in the
underlying fs, would that affect the other threads?  Is some lock
held?

(Actually, if other threads call in and try to start transactions
when the journal is out of space, they'll probably get blocked on the
writeout as well).


> This may or may not be related to your problem, depending on what
> export and mount options you are using.
> 
> The 5 minutes sounds like the journal commit interval.  It's probably
> a long shot, but might you have a heavily fragmented journal?
> use "debugfs" to find the inode number of the journal, and then
>   stat <inode>
> to find where the blocks are.  If they are all over the place, then
> sequential journal writes might not be fast.
> 

A fragmented journal would be bad, of course.  It's better to
create a journal on an empty disk if one possibly can.

However in this case, I'm theorising that the blockage is on
checkpoint writeback: ie, the data which has already been
written into the journal and which is now being written into
the main fs - we can't recycle the journal space until this data
has hit disk, and we write it _all_ out.

Interesting.  Thanks.  We need to start checkpointing earlier,
non-blockingly.  hmm.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]