[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 performance issue with a Berkeley db application



First of all, thank you for your interest in treating this problem.


On Mon, 03 Feb 2003, Andrew Morton wrote:

> Doh, OK, I needed to update automake & autoconf.

Please apologize the inconvenience. It's impolite (of us) to impose so
much work on you to see the problem.

>  1  0      0   5448  26396 176796    0    0     0     0 1004    11 38 62  0  0
>  1  0      0   4888  26396 177348    0    0     0     0 1004     9 36 64  0  0
>  1  0      0   4496  26396 177748    0    0     0     0 1003     9 38 62  0  0
>  0  1      0   5616  25704 176948    0    0     0 15188 1003    14 34 56  0 10
>  0  1      0   5616  25704 176948    0    0     0  7696 1075    27  0  2  0 98
>  0  1      0   5740  25704 176948    0    0     0   520 1106    50  0  1  0 99

Hoho. Seems the kernel doesn't like full write queues too much.

> Now, looking at the enormous amount of system time which the commit=120 run
> took, I assume that the application is doing a _ton_ of overwriting. 
> Redirtying the same pages again and again and again.  So poor old ext3 keeps
> rewriting them again and again.

The profile says c. 99% overwrites vs 1% writes to new pages. However,
in my experiments and AFAIR in Greg's, the system times were quite
reasonable. I'm going with the default commit interval (5 s if I read my
logs right). Killing my test program after a minute:

real    0m57.872s
user    0m1.750s
sys     0m4.920s

This is an AMD Duron 700 MHz with PC-133 mem, but I don't recall if I run it
as PC-133 CL3 or PC-100 CL2.

> You'll hit similar problems with ext2 - on a slower computer, or on a larger
> database, or on a system with the kupdate interval decreased from the 30
> second default.

decreased or increased?

Anyways, my simbf test program is also speedy with ext2fs on IDE:
real    0m25.241s
user    0m5.260s
sys     0m13.870s

This is reiserfs on IDE, on a slower drive:
real    0m21.411s
user    0m5.140s
sys     0m13.990s

This is ext3fs on SCSI:
real    0m26.465s
user    0m5.250s
sys     0m13.990s
(Pretty bursty at the end, it commits 20,000 blocks in one second and
has trickled only 200 every five seconds before).

This is ext3fs on IDE. What's interesting about this is that unlike any
of the previous tests it has a constant flow of "block out" of no less
than 600/s.

So what's special about the combination of "ext3fs and IDE"?

The interesting thing in my test is vmstat 1 -- with SCSI, I get some
hundred blocks trickled out every once in a while. With IDE, I get a
constant write rate of some hundred blocks per second. (IDE lacks the
big write at the end because I abort it prematurely and the fsync() is
missed therefore).

ext3 + SCSI (reiserfs + IDE or ext2 + IDE look similar):

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 2  0  2 114276   9288  52696  99888   0   0     0   132  280   317   1   4  95
 1  0  0 114276   9280  52704  99888   0   0     0     8  237   300   2   2  96
 2  0  0 114276   9280  52704  99888   0   0     0     0  272   284   2   2  96
 3  0  0 114276   9476  52704  99888   0   0     0     0  304   339   4   0  96
 0  0  0 114276   9464  52704  99888   0   0     0     0  307   365   2   4  94
 1  0  0 114276   9836  52736  99888   0   0     0   116  335   435   4   3  93
 3  0  0 114276   9848  52736  99888   0   0     0     0  295   316   0   5  95
 1  0  0 114276   9848  52736  99888   0   0     0     0  291   330   4   0  96
 2  0  0 114276   9848  52736  99888   0   0     0     0  326   417   1   5  94
 5  0  0 114276   9668  52736  99888   0   0     0     0  432   467   4   4  92
 2  0  0 114276   9496  52768  99888   0   0     0   116  335   418   6   1  93
 5  0  0 114276  26008  52772  83208   0   0     0     0  292   300   9  84   7
 6  0  0 114276  24080  52772  85316   0   0     0     0  273   288   3  97   0
 4  0  0 114276  22040  52776  87400   0   0     0     0  285   296   6  94   0
 2  0  0 114276  19976  52776  89464   0   0     0     0  273   301   4  96   0
 2  0  0 114276  17908  52812  91508   0   0     0   124  290   310   5  95   0
 4  0  0 114276  15832  52812  93584   0   0     0     0  276   282   6  94   0
 2  0  0 114276  13760  52840  95628   0   0    24     0  274   292   5  95   0
 2  0  0 114276  11700  52840  97676   0   0     0     0  276   300   5  95   0
 3  0  0 114276   9608  52920  99688   0   0     0   232  308   321   3  97   0
 2  1  0 114276   8308  53012 100908   0   0     0 19924  322   339   2  62  36


ext3 + IDE:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 2  1  1 114276   4788  50540 107424   0   0     0   796  476   303   2   4  94
 1  1  1 114276   4788  50540 107424   0   0     0   800  474   313   1   4  95
 1  1  1 114276   4768  50572 107424   0   0     0   812  470   328   2   2  96
 2  1  1 114276   4768  50572 107424   0   0     0   680  462   308   4   1  95
 2  1  1 114276   4768  50572 107424   0   0     0   708  466   308   0   5  95
 2  1  1 114276   4388  50580 107780   0   0     0   944  459   327   2  25  73
 2  1  3 114276   4372  50604 107784   0   0     0   780  490   418   3   4  93
 3  1  1 114276   4368  50612 107784   0   0     0   672  502   357   3   3  94
 2  1  1 114276   4368  50612 107784   0   0     0   808  453   308   2   2  96
 2  1  1 114276   4296  50684 107784   0   0     0   788  520   418   2   4  94
 0  1  1 114276   4284  50684 107784   0   0     0   824  521   326   2   5  93

> My recommendation is to work out why the application is performing so much
> overwriting, and make it stop.  

Yup, make ext3 on IDE stop this high write rate. =:->

Seriously, as long as ext3 + IDE is a problem and ext2 + IDE isn't (with
2.4 at least), reiserfs + IDE isn't, ext3 + SCSI isn't, there's no
compelling reason to change the application code.

Is there anything that might get in the way? Write barrier code?

Is the ext3fs jdb code shared with other file system types I could test?

> Now without a kernel profile that's all speculation.  I need to reboot this
> machine to run a profile, so I'm going to hit send on this lot ;)

I hope my vmstat data is useful. I can compile and test a specific
kernel version if needed.

-- 
Matthias Andree





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]