more on bogged down server

Ted Potter tpotter at techmarin.com
Wed Apr 12 21:03:48 UTC 2006


On 4/12/06, Harold Hallikainen <harold at hallikainen.com> wrote:
>
> >
> >>
> >>> On Mon, 2006-04-10 at 15:53 -0700, Harold Hallikainen wrote:
> >>>> >
> >>>> >> You have to reduce your load somehow.  Ideally, you should create
> >>>> an
> >>>> anonymous FTP download directory and move all of the downloadable
> >>>> files
> >>>> to it.  The download directory also is used as the home directory for
> >>>> the anonymous FTP user (user "ftp").  I actually use a completely
> >>>> separate filesystem entirely for that.  The filesystem is mounted so
> >>>> that only root has write access.
> >>>> >> Modify your vsftpd.conf file to permit anonymous downloads only and
> >>>> start up vsftpd.  Make sure you also set the "force chroot for
> >>>> anonymous users" option.  Then change your links on your web pages to
> >>>> use "ftp://"-style links pointed at the anonymous download directory
> >>>> paths for the downloadable files.
> >>>> >> FTP is the protocol to use for large file downloads.  HTTP just
> >>>> isn't
> >>>> efficient for that, as you've now found out (the hard way, I might
> >>>> add).
> >>>> >
> >>>> > as the man said, dont use http for this,
> >>>> >
> >>>> > if you must then i suggest that you have a separate partition for
> >>>> your
> >>>> large files and make fstab read as such (example)
> >>>> >
> >>>> > /dev/web /                       ext3    defaults,directio        1
> >>>> 1
> >>>> >
> >>>> > add the directio comment and the file will not go to ram, nor to
> >>>> swap.
> >>>> this will speed up things, but you should hand the downloads over to a
> >>>> different method (not http).
> >>>> >
> >>>>
> >>>> THANKS! I'll see what I can do about moving stuff to ftp. Most of the
> >>>> large files are on phpwiki. I suppose I could slowly go through the
> >>>> pages
> >>>> and change the download files (mostly scanned pdfs) from http:// to
> >>>> ftp://
> >>>> . Any problem with having the ftp download directory being the same as
> >>>> the
> >>>> http root directory? That way I would not have to move anything, just
> >>>> change the protocol prefix on the links.
> >>>
> >>> Well, I would rather not have anonymous FTP sessions chrooting to my
> >>> web server pages, but that's up to you.  Just realize that you may be
> >>> opening some security holes.  Make sure the anonymous FTP user doesn't
> >>> have ANY form of write access to the files (make sure they're owned by
> >>> someone other than "ftp" and group "ftp") and make sure the directories
> >>> are owned by a secure user and have 755 permissions ("rwxr-xr-x").
> >>>
> >>>> Here's another top. It looks like 98.6% of ram is being used, along
> >>>> with
> >>>> 100% of the processor. Not much swap space is being used. It kinda
> >>>> seems
> >>>> like Apache just tries to use as much RAM and CPU as is available. I
> >>>> guess
> >>>> this would be ok if my DSL could send the data out faster (I'm sure
> >>>> that's
> >>>> why these threads live so long). But, it doesn't seem like it should
> >>>> 7%
> >>>> or
> >>>> more of the CPU to a byte from the drive to the ethernet. Maybe ftp's
> >>>> just
> >>>> more efficient at this?
> >>>
> >>> Yes, FTP is much more adept at this.  You also have to remember that
> >>> you have an entire copy of Apache running for each connected user.  If
> >>> you were to do a "vmstat 3", you'll probably see a hell of a lot of
> >>> context switches going on (the stuff under the "cs" column), and that's
> >>> where your CPU is going.  A context switch occurs when the system
> >>> switches from executing one program to another.  There are conditions
> >>> where the system spends all its time switching and not doing anything
> >>> else (we lovingly call this the "scratching the process itch").
> >>>
> >>> As another poster has commented, strip your apache down as much as
> >>> possible.  If you aren't using it, kill off mod_perl (it's a huge
> >>> resource hog), optimize and precompile your PHP stuff (using zend). But
> >>> your best thing is FTP.
> >>>
> >>> Also note that residential DSL is generally optimized to have a big
> >>> incoming pipe (for downloads FROM the net TO you) and a much smaller
> >>> pipe going the other way.  And the upload pipe is usually time-
> >>> multiplexed...you share the upload bandwidth with other users.  So,
> >>> rather than sending a lot of data to a client, you can only send a
> >>> little bit, then the system switches to another task, sends a tiny bit
> >>> there, switches to yet another and so on and so on.  This is obviously
> >>> not the best scenario for running an FTP or Wiki site.
> >>>
> >>> FTP will help (since it's a lighter process and the protocol is
> >>> optimized for sending lots of data), but a large part of your problem
> >>> is
> >>> likely the DSL connection itself, and I can't help much with that
> >>> except
> >>> tell you to see if your DSL provider can give you a symmetrical DSL
> >>> connection (a.k.a. "business DSL")--and that'll probably cost you more.
> >>> That's why there are companies that offer co-location, managed servers
> >>> or web hosting services.  They have high speed, bidirectional pipes to
> >>> the internet and you (usually) don't get bottlenecked at the network
> >>> pipe, which is what you're experiencing.
> >>>
> >>> Sorry I can't help more than that.
> >>>
> >>
> >>
> >> As usual, you're a tremendous help! It's interesting that this problem
> >> did
> >> not appear until last week (when I was out of town). I wonder if maybe
> >> yum
> >> did an update on Apache for me and made it take a lot of cpu time. Does
> >> Apache normally take everything it can get, but, ideally, for a short
> >> period of time?
> >>
> >> Looking at my Apache logs, I see that most of my traffic is from search
> >> engines. Most of my files are pretty small, but there are a few pdfs
> >> that
> >> are large image files. So far I do not have a robots.txt, but am
> >> thinking
> >> of adding one to decrease the load due to search engines. I'd like stuff
> >> to be indexed, but maybe either decrease the frequency of indexing or
> >> tell
> >> them to not index the pdfs (which have ocr text in the background). I've
> >> only spent about 5 minutes looking at info about robots.txt so far, but
> >> do
> >> you think that could help?
> >>
> >> THANKS!
> >>
> >> Harold
> >
> >
> > Following up on my own post, the large pdfs are all in a single directory,
> > so I've put that in my new robots.txt file. I'll watch top through today
> > to see if that helps. Again, most of my traffic seems to be search engines
> > (my content is popular, but not THAT popular), so I don't think there'll
> > be a problem with real users getting the large files (I just read on one
> > of my mailing lists of a user downloading a 3M file successfully). At
> > least that's my hope! We'll see.
> >
> > On suggested changes in my httpd.conf, I do have a few perl scripts. If I
> > take out mod-perl, does Apache just hand it over to perl for processing
> > instead of doing some (or all) of it itself? The perl scripts that are
> > called by apache are pretty small and should execute quickly. Same with
> > the php stuff. I think they may take a lot of resources, but only for a
> > very short period of time. I think my major problem is those large pdfs
> > being sucked up by search engines. By stopping those, I hope I'll speed up
> > stuff for everyone.
> >
> > It IS interesting that I've had this stuff up for months and months on
> > this server and only recently started having problems. We'll see how it
> > goes.
> >
> > I REALLY appreciate all the help on this list!
> >
> > Harold
> >
>
>
> I'm still trying to figure out what changed on this server that is getting
> overloaded. I had started Webmin and its bandwidth monitoring a little
> before I first noticed the problem. So I turned that off last night. I've
> also added robots.txt disallowing the directory where all the large pdfs
> are and setting crawl-delay to 60, hoping to reduce the load from search
> engines. I notice, running top, that httpd seems to take all the cpu
> cycles. If just one instance of httpd is running, it will maybe take 98%
> of the cpu. If a bunch are running, it seems to divide out so it takes
> close to 100% total. I see in the Apache documentation that there's a
> config called RLimitCPU, but it appears to limit the number of CPU
> seconds, not percentage of cycles. Is there some way to keep httpd from
> taking over the system, or is, perhaps, this not a problem.
>
> THANKS!
>
> Harold

Since I know very little about this I will open my mouth.

If I understand correctly your server was working fine at one point and then
"suddenly" broke

Can you try to simulate the problem from the lan side ? that is take it off the
WAN (internet) and "attack" your server from the lan clients in order to watch
the server respond.
I thought apache would only spawn a new server process if it ran out
of muscle from the current request and there is a config limit you can
set - I am guessing here.

Another (not so fun) idea - turn off EVERYTHING but the apache server
and see how
the box does then.

Silly idea but - check your hardware - is memory broken ? HD got a bad
sector and is spinning away mindlessly ? network connection OK at the
proper speed duplex etc.....
disk space OK ?

I vote for the lan tests myself, you can exercise control over the bandwidth.

a few months back I suddenly could not call up certain sites - drove
myself nuts until I went it and re-configure the router - there
appeared to be nothing wrong with the router and even the bandwidth
speed tests were fine - but once I did the reconfigure the problem
went away - go figure.

hth








>
> --
>
> _______________________________________________
> Redhat-install-list mailing list
> Redhat-install-list at redhat.com
> https://www.redhat.com/mailman/listinfo/redhat-install-list
> To Unsubscribe Go To ABOVE URL or send a message to:
> redhat-install-list-request at redhat.com
> Subject: unsubscribe
>


--
Ted Potter
tpotter at techmarin.com




More information about the Redhat-install-list mailing list