more on bogged down server

Harold Hallikainen harold at hallikainen.com
Tue Apr 11 15:20:57 UTC 2006


>
>> On Mon, 2006-04-10 at 15:53 -0700, Harold Hallikainen wrote:
>>> >
>>> >> You have to reduce your load somehow.  Ideally, you should create an
>>> anonymous FTP download directory and move all of the downloadable files
>>> to it.  The download directory also is used as the home directory for
>>> the anonymous FTP user (user "ftp").  I actually use a completely
>>> separate filesystem entirely for that.  The filesystem is mounted so
>>> that only root has write access.
>>> >> Modify your vsftpd.conf file to permit anonymous downloads only and
>>> start up vsftpd.  Make sure you also set the "force chroot for
>>> anonymous users" option.  Then change your links on your web pages to
>>> use "ftp://"-style links pointed at the anonymous download directory
>>> paths for the downloadable files.
>>> >> FTP is the protocol to use for large file downloads.  HTTP just
>>> isn't
>>> efficient for that, as you've now found out (the hard way, I might
>>> add).
>>> >
>>> > as the man said, dont use http for this,
>>> >
>>> > if you must then i suggest that you have a separate partition for
>>> your
>>> large files and make fstab read as such (example)
>>> >
>>> > /dev/web /                       ext3    defaults,directio        1 1
>>> >
>>> > add the directio comment and the file will not go to ram, nor to
>>> swap.
>>> this will speed up things, but you should hand the downloads over to a
>>> different method (not http).
>>> >
>>>
>>> THANKS! I'll see what I can do about moving stuff to ftp. Most of the
>>> large files are on phpwiki. I suppose I could slowly go through the
>>> pages
>>> and change the download files (mostly scanned pdfs) from http:// to
>>> ftp://
>>> . Any problem with having the ftp download directory being the same as
>>> the
>>> http root directory? That way I would not have to move anything, just
>>> change the protocol prefix on the links.
>>
>> Well, I would rather not have anonymous FTP sessions chrooting to my
>> web server pages, but that's up to you.  Just realize that you may be
>> opening some security holes.  Make sure the anonymous FTP user doesn't
>> have ANY form of write access to the files (make sure they're owned by
>> someone other than "ftp" and group "ftp") and make sure the directories
>> are owned by a secure user and have 755 permissions ("rwxr-xr-x").
>>
>>> Here's another top. It looks like 98.6% of ram is being used, along
>>> with
>>> 100% of the processor. Not much swap space is being used. It kinda
>>> seems
>>> like Apache just tries to use as much RAM and CPU as is available. I
>>> guess
>>> this would be ok if my DSL could send the data out faster (I'm sure
>>> that's
>>> why these threads live so long). But, it doesn't seem like it should 7%
>>> or
>>> more of the CPU to a byte from the drive to the ethernet. Maybe ftp's
>>> just
>>> more efficient at this?
>>
>> Yes, FTP is much more adept at this.  You also have to remember that
>> you have an entire copy of Apache running for each connected user.  If
>> you were to do a "vmstat 3", you'll probably see a hell of a lot of
>> context switches going on (the stuff under the "cs" column), and that's
>> where your CPU is going.  A context switch occurs when the system
>> switches from executing one program to another.  There are conditions
>> where the system spends all its time switching and not doing anything
>> else (we lovingly call this the "scratching the process itch").
>>
>> As another poster has commented, strip your apache down as much as
>> possible.  If you aren't using it, kill off mod_perl (it's a huge
>> resource hog), optimize and precompile your PHP stuff (using zend). But
>> your best thing is FTP.
>>
>> Also note that residential DSL is generally optimized to have a big
>> incoming pipe (for downloads FROM the net TO you) and a much smaller
>> pipe going the other way.  And the upload pipe is usually time-
>> multiplexed...you share the upload bandwidth with other users.  So,
>> rather than sending a lot of data to a client, you can only send a
>> little bit, then the system switches to another task, sends a tiny bit
>> there, switches to yet another and so on and so on.  This is obviously
>> not the best scenario for running an FTP or Wiki site.
>>
>> FTP will help (since it's a lighter process and the protocol is
>> optimized for sending lots of data), but a large part of your problem is
>> likely the DSL connection itself, and I can't help much with that except
>> tell you to see if your DSL provider can give you a symmetrical DSL
>> connection (a.k.a. "business DSL")--and that'll probably cost you more.
>> That's why there are companies that offer co-location, managed servers
>> or web hosting services.  They have high speed, bidirectional pipes to
>> the internet and you (usually) don't get bottlenecked at the network
>> pipe, which is what you're experiencing.
>>
>> Sorry I can't help more than that.
>>
>
>
> As usual, you're a tremendous help! It's interesting that this problem did
> not appear until last week (when I was out of town). I wonder if maybe yum
> did an update on Apache for me and made it take a lot of cpu time. Does
> Apache normally take everything it can get, but, ideally, for a short
> period of time?
>
> Looking at my Apache logs, I see that most of my traffic is from search
> engines. Most of my files are pretty small, but there are a few pdfs that
> are large image files. So far I do not have a robots.txt, but am thinking
> of adding one to decrease the load due to search engines. I'd like stuff
> to be indexed, but maybe either decrease the frequency of indexing or tell
> them to not index the pdfs (which have ocr text in the background). I've
> only spent about 5 minutes looking at info about robots.txt so far, but do
> you think that could help?
>
> THANKS!
>
> Harold


Following up on my own post, the large pdfs are all in a single directory,
so I've put that in my new robots.txt file. I'll watch top through today
to see if that helps. Again, most of my traffic seems to be search engines
(my content is popular, but not THAT popular), so I don't think there'll
be a problem with real users getting the large files (I just read on one
of my mailing lists of a user downloading a 3M file successfully). At
least that's my hope! We'll see.

On suggested changes in my httpd.conf, I do have a few perl scripts. If I
take out mod-perl, does Apache just hand it over to perl for processing
instead of doing some (or all) of it itself? The perl scripts that are
called by apache are pretty small and should execute quickly. Same with
the php stuff. I think they may take a lot of resources, but only for a
very short period of time. I think my major problem is those large pdfs
being sucked up by search engines. By stopping those, I hope I'll speed up
stuff for everyone.

It IS interesting that I've had this stuff up for months and months on
this server and only recently started having problems. We'll see how it
goes.

I REALLY appreciate all the help on this list!

Harold




More information about the Redhat-install-list mailing list