[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: ext3 with quota under heavy load.



Sorry for the several month delay, but the server had stopped
"crashing" for this period of time so I hadn't fully reported back.  I
have changed the backup script and can now almost initiate the "crash"
at will.  I'm including the prior conversation for completeness. 

/home is a 325Gb partition on a hardware raid controller.  The system
is now running a vanilla 2.4.22 (vfs0 quotas are included now, no -ac
needed).

This is what the backup command was changed to:

/bin/nice -n 19 /usr/bin/find /home/ -type d -mindepth 1 -maxdepth 1
-exec /usr/bin/rsync -aH {} backup backupserver::backup/home \;

This command locks almost daily.  The system is still repsponsive,
however no data maybe written to /home/ during this time. 'touch
/home/tmp' just sits there...

I hadn't been able to get to the system in the past, but now that I can
reproduce the crash almost at will, I'll be more able to test the
situation.

Thanks for your prior and current help, looking forward to hearing for
suggestions on how I can track down what's locking and find a solution.

Thanks,

Dale


--- Andreas Dilger <adilger clusterfs com> wrote:
> On Jun 26, 2003  06:46 -0700, Dale wrote:
> > I have a problem with an NFS server for my network.  It has ran
> kernels
> > 2.4.18-ac4 - 2.4.21-ac1, all with problems.  The -ac patches are
> used
> > to provide the new style quota support.  The system seems to have
> > gotten even less stable with the new kernel versions.
> > 
> > This morning around 5 am, I got a page the system was unresponding
> to
> > NFS requests.  I ssh'd in, and found the loadavg at ~50.  Below are
> > some snippets from ps at the time:
> > 
> > root      3414  0.8  0.1  3904 3048 ?        DN   04:02   1:45
> > /usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e
> /tmp,/var/tmp,/us
> > root      3979  0.0  0.0  2588 1192 ?        DN   04:14   0:00
> > /usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3
> > 
> > The rsync command is backing up across the network to a backup nfs
> > server.  updatedb starts at 4:02 am, and the rsync had been running
> > since 3:30 and was half-way completed (estimated by the 'p' in the
> > uername).
> > 
> > Also there were 32 nfsd's just like this:
> > root  851  0.0  0.0   0    0 ?    DW   Jun19   4:35 [nfsd]
> > 
> > and these, the other 4 kjournald's were in SW.
> > root   7  0.1  0.0   0    0 ?     DW   Jun19  17:04 [kswapd]
> > root 144  0.0  0.0   0    0 ?     DW   Jun19   6:53 [kjournald]
> > 
> > I'm wondering what my options are, this has happened ~10 times in
> the
> > last 6 months, although the system went a period of ~120 days
> without a
> > hiccup.  This last time on 2.4.21-ac1 was only 6 days.
> > It wouldn't be so bad if a `shutdown -r now` would restart it, but
> it
> > hangs while shutting down nfs and during killall and needs hard
> > rebooted.
> 
> This almost certainly is a lock deadlock of some sort.  I've had
> pretty
> good luck in debugging such problems just by running "sysrq-T" on the
> console and/or using "crash" to examine the running kernel.  This
> needs
> a fair amount of knowledge of the various locks in ext3.  The most
> common problems are related to lock ordering problems with some
> process
> starting a journal transaction and then blocking on a lock (e.g.
> directory
> or inode semaphore, or superblock lock), and some other process
> holding
> that lock and trying to start a new transaction when the journal is
> full.
> 
> The journal being full is a crucial issue, because if it isn't full
> you
> can start a new transaction without problems, but when it is full you
> need
> to flush the journal and wait for all existing users to free up their
> handles,
> which will never happen if the first process has a transaction handle
> and is
> blocked waiting for a lock the second process is holding.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]