[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: how to counteract slowdown



Daniel Pittman wrote:
>
> ... 
> > Postfix does lots of fsync()s.
> 
> Yup. I thought one of the features of the journaled data mode was that
> it helped here, because the fsync() only needed to hit the journal. Can
> that actually block on write-out?

It has to - that's the nature of fsync() - it can't return
until the file is written to non-volatile storage.

There is some short-term advantage to the fact that we'll
write all the data into the journal in a linear swipe,
but once the filesystem load exceeds 30 seconds in duration,
or exceeds the system-wide allowable dirty buffer limit we
shall start writing things into the main filesystem, and
that's going to distract the disk head.

So a 20-second burst of write activity will seem really quick
on ext2, because all the heavy lifting starts afterwards.

We get some benefit from the linear journal for speeding
up fsync(), but the real advantage kicks in when there
are many threads _all_ doing fsync(), especially in
different directories.  Large MTAs.  The different directories
requirement is an artifact of VFS locking - fixable, but not,
I suspect, in 2.4.


> >> Fetchmail continues doing this for ~30 seconds, fetching mail from a
> >> remote system and feeding it to Postfix fairly quickly.
> >>
> >> Postfix, in the meantime, shuffles the file name around between a few
> >> directories on that first disk, then appends the data to ~/Mailbox.
> >> This is on a second partition, but the same disk, with a 100MB
> >> journal and data=journaled.
> >>
> >> The total data fetched is well under 5MB. Even accounting for ten
> >> copies of that, it should *never* have filled the journal more than
> >> 50% full -- and the mail fetch has been the *only* read/write
> >> activity (other than inode atime) going on at that stage...
> >
> > Well, there are a lot of synchronous writes going on here.  They cost.
> 
> Sure. It's probably just that I misunderstand the problem and have
> assigned it to the journal flush because it so closely mirrors the time
> that kjournald waits between flushes, and because that's the only
> difference between ext2 and ext3 which don't and do, respectively, show
> this...

Oh, it's surely ext3 commits.

If you build the kernel with ext3 debugging support, and
run

	echo 1 > /proc/sys/fs/jbd-debug
	service syslog stop
	dmesg -n 8

then you can watch the commit activity as the workload proceeds.
If you've a good eye, you can see where the stalls occur, although
one does need to go into the code to correlate things.

Of course, hitting ^C in kgdb at the critical time and then
having a poke around the various processes is way more useful :)

> >> I can't for the life of me see why my system ends up blocking on
> >> writes every five seconds during this process. It strikes me that the
> >> data should all be hitting the journal and, at worst, starting to
> >> flush gently. No blocking of anything, if possible.
> >
> > That's what should be happening for journalled data mode.  We write
> > everything in a nice slurp into the journal and then leave it for
> > kupdate writeback.   Unless we come under pressure for journal space
> > or memory, in which case a complete flush is forced.
> 
> Is there any way for me to tell if the journal is getting full enough to
> require flushing? Memory, incidentally, is 256MB and is sitting with:

well, turning on JBD debugging will give you an indication
of when commit fires.  The other important factor here is
how many dirty buffers there are in the system. Once this
reaches a certain proportion of total memory, writeout will
commence.  It's determined by the first parameter in
/proc/sys/vm/bdflush.  By default, 40% of memory.

And whoops.  We don't call balance_dirty() as we refile
journalled buffers for writeback in journalled data mode.
It probably doesn't make a lot of difference on largish
memory machines, but it's a bug.

--- linux-2.4.15-pre4/fs/jbd/commit.c	Mon Nov 12 11:16:12 2001
+++ linux-akpm/fs/jbd/commit.c	Mon Nov 12 21:28:40 2001
@@ -659,6 +659,7 @@ skip_commit:
 			__brelse(bh);
 		}
 		spin_unlock(&journal_datalist_lock);
+		balance_dirty();
 	}
 
 	/* Done with this transaction! */


> 
> Maybe I should hack the driver for a longer (or configurable) delay in
> flushing and see if I can reproduce the issue...

I doubt if it'll help, but the `HZ * 5' in journal_init_common()
is what to tweak.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]