[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: how to counteract slowdown



On Mon, 12 Nov 2001, Andrew Morton wrote:
> Daniel Pittman wrote:
>>
>> ... 
>> > Postfix does lots of fsync()s.
>> 
>> Yup. I thought one of the features of the journaled data mode was
>> that it helped here, because the fsync() only needed to hit the
>> journal. Can that actually block on write-out?
> 
> It has to - that's the nature of fsync() - it can't return
> until the file is written to non-volatile storage.

Duh. Let me fetch that brown paper bag...

[...]

> So a 20-second burst of write activity will seem really quick
> on ext2, because all the heavy lifting starts afterwards.

Right. Whereas on ext3, the five second commit time causes the journal
to flush transactions, which cause the hard work of moving data to it's
final location, all in a spurt.

That then makes the latency for the reads and writes that Postfix wants
to do high, starving it for a moment as the kernel processes that write
load.

>From reading commit.c, it /looks/ like the journal is unlocked during
the asynchronous write of the data blocks, so Postfix isn't being
starved because it's waiting on the journal lock...

...but, of course, any call to fsync() ends up doing what looks like a
synchronous wait on the data hitting it's final location?

Did I read the code correctly? It looks like you end up calling
journal_force_commit for any fsync(), which sets handle->h_sync to 1,
then does journal_stop.

This mans that journal_stop waits in log_wait_commit for the transaction
to be flushed out of the journal and onto it's final location?


...I must have that wrong. It's syncing that to the journal, not to the
disk, right? So, fsync() results in any outstanding transaction having
it's blocks sent to the journal and waits on that, right?


Assuming that's so, I must be very dense, as it seems that the kjournald
code is also writing only to the journal, not to the actual disk.

That means that the second or so of write-out every five seconds is
because I have plenty of spare RAM for buffering output, but my disks
are poor, slow IDE devices that write slowly.

So, the five second time elapses, the journal data is flushed to the
journal itself, the disks spend ages waiting on that completing, then it
comes back and continues like a sane machine.

Sure, that makes some sense. :)

> We get some benefit from the linear journal for speeding
> up fsync(), but the real advantage kicks in when there
> are many threads _all_ doing fsync(), especially in
> different directories.  Large MTAs.  The different directories
> requirement is an artifact of VFS locking - fixable, but not,
> I suspect, in 2.4.

Right. Postfix meets the multiple directories requirement; it hashes the
queue no matter what, so each message is in a distinct directory. Which,
knowing ext[23], increases the seek time dramatically because they are
all far apart. :)

[...]

>> > Well, there are a lot of synchronous writes going on here. They
>> > cost.
>> 
>> Sure. It's probably just that I misunderstand the problem and have
>> assigned it to the journal flush because it so closely mirrors the
>> time that kjournald waits between flushes, and because that's the
>> only difference between ext2 and ext3 which don't and do,
>> respectively, show this...
> 
> Oh, it's surely ext3 commits.

[...]

> then you can watch the commit activity as the workload proceeds.
> If you've a good eye, you can see where the stalls occur, although
> one does need to go into the code to correlate things.

Yup. I might get that done some time soon, as it's not that big a deal
to rebuild a kernel for a little testing. :)

> Of course, hitting ^C in kgdb at the critical time and then
> having a poke around the various processes is way more useful :)

Well, probably. I don't really want to go to that extreme on a
production machine, though, even if it's only got one user.

[...]

>> Is there any way for me to tell if the journal is getting full enough
>> to require flushing? Memory, incidentally, is 256MB and is sitting
>> with:
> 
> well, turning on JBD debugging will give you an indication
> of when commit fires.  

So does the write load monitoring, I suspect, but I will verify that.

[...]

>> Maybe I should hack the driver for a longer (or configurable) delay
>> in flushing and see if I can reproduce the issue...
> 
> I doubt if it'll help, but the `HZ * 5' in journal_init_common()
> is what to tweak.

I except it will, actually, because it's going to increase the length of
time between ext3 writing data to the journal and the kjournald thread
kicking off the commit that forces those blocks to their final
location...

...unless my analysis above is right.

        Daniel

-- 
Ignorance breeds monsters to fill up the vacancies of the soul that are
unoccupied by the verities of knowledge.
        -- Horace Mann





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]