more on bogged down server

Thu Apr 13 17:32:47 UTC 2006

On Thu, 2006-04-13 at 07:39 -0700, Harold Hallikainen wrote:
> > On Wed, 2006-04-12 at 13:55 -0700, Harold Hallikainen wrote:
> >
> > <snip>
> >> >
> >> > Have you done the "vmstat 3" thing yet to see if you have context
> >> > switching going nuts?
> >> >
> >>
> >> I guess I have to read about vmstat 3. I dunno what it means, but here's
> >> some output:
> >>
> >>  vmstat 3
> >> procs -----------memory---------- ---swap-- -----io---- --system--
> >> ----cpu----
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> >> id wa
> >>  8  0  25068 159440  15576 456728    0    2   188    52  378    95 94  1
> >> 5  0
> >>  9  0  25068 159320  15576 456856    0    0    43     0  358    73 100
> >> 0
> >> 0  0
> >>  8  0  25068 159200  15584 456984    0    0    43    25  350    74 100
> >> 0
> >> 0  0
> >>  8  0  25068 159080  15584 457112    0    0    43     0  348    74 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15588 457196    0    0    28    17  356    72 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15592 457196    0    0     0    17  352    66 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15592 457196    0    0     0     0  354    72 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15600 457196    0    0     0    19  350    73 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15608 457196    0    0     0    31  355    76 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15608 457196    0    0     0     0  350    72 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15616 457196    0    0     0    31  354    73 100
> >> 0
> >> 0  0
> >> procs -----------memory---------- ---swap-- -----io---- --system--
> >> ----cpu----
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> >> id wa
> >>  9  0  25068 159020  15616 457196    0    0     0     0  353    71 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15624 457196    0    0     0    27  352    77 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15624 457196    0    0     0    28  359    77 100
> >> 0
> >> 0  0
> >>  8  0  25068 159020  15632 457196    0    0     0    17  349    76 100
> >> 0
> >> 0  0
> >>  9  0  25068 150320  15640 457300    0    0    35    17  361    90 96  4
> >> 0  0
> >>  9  1  25068 137820  15748 458000    0    0   269     0  419   206 87 13
> >> 0  0
> >> 10  0  25068 129764  15928 461872    0    0  1348   211  629   630 93  7
> >> 0  0
> >> 10  0  25068 128296  15932 462000    0    0    43    55  355    77 99  1
> >> 0  0
> >> 10  0  25068 127876  15932 462128    0    0    43     0  350    72 99  1
> >> 0  0
> >> 10  0  25068 127632  15936 462252    0    0    41    39  353    73 100
> >> 0
> >> 0  0
> >> 10  0  25068 127508  15936 462252    0    0     0     0  355    66 100
> >> 0
> >> 0  0
> >> procs -----------memory---------- ---swap-- -----io---- --system--
> >> ----cpu----
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> >> id wa
> >> 11  0  25068 126336  15956 462476    0    0    79    23  350    77 100
> >> 0
> >> 0  0
> >> 10  0  25068 126032  15960 462604    0    0    43    27  356    77 100
> >> 0
> >> 0  0
> >> 10  0  25068 125912  15960 462732    0    0    43     0  351    74 100
> >> 0
> >> 0  0
> >> 10  0  25068 125792  15964 462860    0    0    43    24  351    74 100
> >> 0
> >> 0  0
> >> 10  0  25068 125672  15964 462988    0    0    43   217  365    82 100
> >> 0
> >> 0  0
> >> 10  0  25068 125552  15972 463116    0    0    43    20  353    78 100
> >> 0
> >> 0  0
> >> 10  0  25068 125296  15988 463128    0    0     7    36  385    82 100
> >> 0
> >> 0  0
> >> 10  0  25068 125060  15988 463256    0    0    43     0  354    71 99  1
> >> 0  0
> >> 10  0  25068 124632  15996 463384    0    0    43    39  349    73 99  1
> >> 0  0
> >> 10  0  25068 124264  15996 463512    0    0    43     0  355    73 100
> >> 0
> >> 0  0
> >> 10  0  25068 124204  16004 463584    0    0    24    28  350    76 100
> >> 0
> >> 0  0
> >> procs -----------memory---------- ---swap-- -----io---- --system--
> >> ----cpu----
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
> >> id wa
> >> 11  0  25068 123960  16012 463712    0    0    43    21  356    76 100
> >> 0
> >> 0  0
> >> 10  0  25068 123720  16016 463936    0    0    76     0  353    75 100
> >> 0
> >> 0  0
> >> 10  0  25068 123360  16020 464320    0    0   128    21  350    72 100
> >> 0
> >> 0  0
> >> 10  0  25068 123120  16020 464576    0    0    85     0  357    76 100
> >> 0
> >> 0  0
> >> 10  0  25068 122812  16044 464832    0    0    92    21  366    79 100
> >> 0
> >> 0  0
> >> 10  0  25068 122568  16052 464960    0    0    43    73  362    82 100
> >> 0
> >> 0  0
> >> 10  0  25068 122336  16052 465216    0    0    85     0  356    70 100
> >> 0
> >> 0  0
> >> 10  0  25068 122216  16060 465344    0    0    43    32  349    74 100
> >> 0
> >> 0  0
> >> 12  0  25068 122156  16060 465420    0    0    25     0  357    72 100
> >> 0
> >> 0  0
> >>
> >>
> >> I restarted httpd about an hour ago. top now reports a load average of
> >> 10.06 9.06 8.16
> >
> > Hmmm.  Well, you're not going nutsy with the context switches (that's
> > the "cs" column).  You aren't swapping ("si" and "so").  You also aren't
> > bogged down in disk I/O ("bi" and "bo") and you're not getting swamped
> > with interrupts ("in").  To be truthful, I'd expect more interrupt
> > activity because of network I/O so you may still be throttled back by
> > your ISP.  I can't be sure.
> >
> > You are spending a ridiculous amount of time in userspace, so that's an
> > indicator that SOMETHING changed in your web config.  I'd go look at the
> > yum logs and see if something you use in your site (mod_perl, perl, PHP,
> > etc.) got updated and consequently broke.  You might even try an strace
> > of a couple of the web processes to see what the hell they're doing.
> >
> 
> 
> strace sounds interesting. I'll have to read up on it. Meanwhile, is there
> some way to take a pid out of top and see what url(s) httpd is working on?

Yeah, try "lsof -p <pid>".  That will list all of the open files that
process <pid> has open.

> Prior to making the trip to Arkansas when this problem first appeared, I
> DID do an update to gallery, the photo gallery program. Looking at httpd
> logs, I see search engines calling the slideshow, which is pretty
> processor intensive. So, I've added gallery to my disallow list in
> robots.txt . Also, looking through gallery config last night, I found
> there's an option that improves cpu usage by about 90% by only updating
> dynamic pages every 15 minutes instead of recreating them on the fly. I'll
> see how these two changes help.

They should.  Remember that any current robot probes won't be aborted by
adding the robots.txt.

> It's also been suggested that I mess with this on the LAN, removing WAN
> requests. I'll try that out this weekend to see if I can duplicate the cpu
> loading with some known url request.

Good deal.

> THANKS to all!

Hope we fix this!
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-      "Microsoft is a cross between The Borg and the Ferengi.       -
-  Unfortunately they use Borg to do their marketing and Ferengi to  -
-               do their programming."  -- Simon Slavin              -
----------------------------------------------------------------------