[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: how to counteract slowdown



On Mon, 12 Nov 2001, Andrew Morton wrote:
> Daniel Pittman wrote:
>> On Mon, 12 Nov 2001, Andrew Morton wrote:

[...]

>> > Interesting.  Thanks.  We need to start checkpointing earlier,
>> > non-blockingly.  hmm.
>> 
>> Please.  At the moment, performance on my machine has some notable
>> hiccups when fetching mail.
>> 
>> I use fetchmail and it takes around 30 seconds to pull down 250
>> messages oven the SSH tunnel it's using. So, I see the mail fetching
>> running fine for five seconds -- then stall as data is written out to
>> disk. ~1.5k write operations, in fact, stalling the system for ~3/4
>> seconds.
> 
> Is this filesytem mounted in ordered-data mode?

Journaled, which I was pretty sure I mentioned.

> Does fetchmail write 250 files, or one?

It doesn't write anything itself[1], it feeds SMTP messages to Postfix.
It does not actually touch local disk itself, letting the local MTA
handle that.

> If both, then a write of 1500 potentially very discontiguous
> blocks will certainly take some time.  This normally won't
> cause the writer to block.  But if that same process needs
> to _read_ something, the write activity will certainly delay
> that.

That could be it. Next time I get a big mail backlog to process, I will
watch it. It's worth noting that I can see the same five second drop-off
effect when, for example, unpacking the linux-kernel .tar.bz2 -- it
performs in a smooth fashion on ext2, five second drops under ext3.

All on journaled data filesystems.

> The ideal fix for this is to not spread the data all over the
> disk.   is all the write activity to files in the same directory,
> or to multiple ones?
> 
> Journalled data may provide some small improvement here.

It is journaled data. :)

>> The fetch then picks up again, happily, until the next five second
>> burst.
>> 
>> The setup is that mail is fetched and fed to a local Postfix process
>> that drops it into it's internal queue system, which is on a disk
>> with a 100MB journal and data=journaled.
> 
> Postfix does lots of fsync()s.

Yup. I thought one of the features of the journaled data mode was that
it helped here, because the fsync() only needed to hit the journal. Can
that actually block on write-out?

>> Fetchmail continues doing this for ~30 seconds, fetching mail from a
>> remote system and feeding it to Postfix fairly quickly.
>> 
>> Postfix, in the meantime, shuffles the file name around between a few
>> directories on that first disk, then appends the data to ~/Mailbox. 
>> This is on a second partition, but the same disk, with a 100MB
>> journal and data=journaled.
>> 
>> The total data fetched is well under 5MB. Even accounting for ten
>> copies of that, it should *never* have filled the journal more than
>> 50% full -- and the mail fetch has been the *only* read/write
>> activity (other than inode atime) going on at that stage...
> 
> Well, there are a lot of synchronous writes going on here.  They cost.

Sure. It's probably just that I misunderstand the problem and have
assigned it to the journal flush because it so closely mirrors the time
that kjournald waits between flushes, and because that's the only
difference between ext2 and ext3 which don't and do, respectively, show
this...

>> I can't for the life of me see why my system ends up blocking on
>> writes every five seconds during this process. It strikes me that the
>> data should all be hitting the journal and, at worst, starting to
>> flush gently. No blocking of anything, if possible.
> 
> That's what should be happening for journalled data mode.  We write
> everything in a nice slurp into the journal and then leave it for
> kupdate writeback.   Unless we come under pressure for journal space
> or memory, in which case a complete flush is forced.

Is there any way for me to tell if the journal is getting full enough to
require flushing? Memory, incidentally, is 256MB and is sitting with:

MemTotal:       287848 kB
MemFree:         11112 kB
MemShared:           0 kB
Buffers:         39876 kB
Cached:          75272 kB
SwapCached:      19116 kB
Active:         143140 kB
Inactive:       114260 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       287848 kB
LowFree:         11112 kB
SwapTotal:      602428 kB
SwapFree:       532416 kB

This is not too dissimilar from my normal running state where the issue
can be observed. 

> Either way around, the data has to be written out into the main
> filesystem sometime, somehow.

Yes, quite. It just surprised me to see the odd performance thang.

[...]

> We commit when there's data which is mor ethan five seconds old.
> That should be OK.   

Maybe I should hack the driver for a longer (or configurable) delay in
flushing and see if I can reproduce the issue...

> The advantages of delaying 30 seconds, which is what ext2 will do, are
> fairly small for normal non-benchmark workloads.

Well, performance is not really a problem for me. I am quite happy using
fully journaled data and letting things run. I should do more analysis
and see if I am just making stuff up to annoy you. ;)

> From your description, if the first fs is in ordered data mode
> I don't think there's anything wrong or fixable here.   

It's not. It's journaled data mode. All of 'em are, because the one
crash that I managed to have recently was very ugly and left me quite
paranoid about data integrity...

> The best way to improve is to be able to lay the data out better on
> disk, which is currently being looked at.

I noticed, and I am looking forward to seeing the result of it. I should
also find time to reformat (and repartition) a little, then get a nice,
clean ext3 partition there. I think the journal is a little fragmented,
though not much, on the root. 

> Or am I being complacent here?  You provided a great description.
> Please, just a little more ;)

I am very happy to help, and would even go so far as to hack around my
kernel in an effort to get better details for you. I have written kernel
code before, though not gotten any accepted yet. ;)


So, quick summary:

Two partitions:

root (mounted /, contains /var): data=journaled, .journal == 100MB
home (mounted /home/daniel): data=journaled, <8> = 100MB

Data flow:

* fetchmail reads from socket, talks to SMTP server on socket.
* Postfix listens to SMTP, accepts articles from fetchmail.
* Postfix moves mail through it's queues in /var/postfix/* (on root)
* Postfix sends message to ~/Mailbox (in /home/daniel) (on home)

Issue: Every five seconds there is a *huge* burst of write activity, on
the order of 1.5 to 2.5 thousand blocks written over the 0-2 seconds.

The filesystems are ext3 with 2K block size[2]. Fetchmail is delayed
visibly during this, which means that Postfix must have blocked.

This happens *each* five seconds until the process finishes.


     Daniel


Footnotes: 
[1]  So far as I know. It /shouldn't/, because it's been told not to.

[2]  I use 2K, not 4K, because I lost ~30% more space to 4K blocks with
     slack space for my file set.[3]

[3]  I don't just buy bigger disks, incidentally, because this is a
     laptop and (good) laptop HDD costs through the nose...

-- 
I don't think the son of a bitch [Vice-President Nixon] knows the difference
between telling the truth and lying.
        -- Harry S. Truman





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]