[linux-lvm] LVM onFly features

Michael Loftis mloftis at wgops.com
Mon Dec 12 01:14:39 UTC 2005



--On December 12, 2005 9:15:39 AM +1100 Nathan Scott <nathans at sgi.com> 
wrote:


>> XFS has terrible unpredictable performance in production.  Also it has
>> very
>
> What on earth does that mean?  Whatever it means, it doesn't
> sound right - can you back that up with some data please?

The worst problems we had we're likely most strongly related to running out 
of journal transaction space.  When XFS was under high transaction load 
sometimes it would just hang everything syncing meta-data.  From what I 
understand this has supposedly been dealt with, but we were still having 
these issues when we decommissioned the last XFS based server a year ago. 
Another datapoint is the fact we primarily served via NFS, which XFS 
(atleast at the time) still didn't behave great with, I never did see any 
good answers on that as I recall.

>
>> bad behavior when recovering from crashes,
>
> Details?  Are you talking about this post of yours:
> http://oss.sgi.com/archives/linux-xfs/2003-06/msg00032.html

That particular behavior happened a lot.  And it wasn't annoying that it 
happened, so much so that it happened after the system claimed it was 
clean.  Further, yes, that hardware has been fully checked out.  There's 
nothing wrong with the hardware.  I wish there was, that'd make me feel 
better honestly.  The only thing I can reason is bugs in the XFS 
fsck/repair tools, or *maybe* an interaction with XFS and the DAC960 
controller, or NFS.  The fact that XFS has weird interactions with NFS at 
all bugs me, but I don't understand the code involved well enough.  There 
might be a decent reason.

>
> There have been several fixes in this area since that post.
>
>> often times it's tools totally fail to clean the filesystem.
>
> In what way?  Did you open a bug report?
>
>> It also needs larger kernel stacks because
>> of some of the really deep call trees,
>
> Those have been long since fixed as far as we are aware.  Do you
> have an actual example where things can fail?

We pulled it out of production and replaced XFS with Reiser.  At the time 
Reiser was far more mature on Linux.  XFS Linux implementation (in 
combination with other work in the block layer as you mention later) may 
have matured to atleast a similar (possibly moreso) point now.  I've just 
personally lost more data due to XFS than Reiser.  I've also had problems 
with ext3 in the (now distant) past while it was teething still.


>> so when you use it with LVM or MD it
>> can oops unless you use the larger kernel stacks.
>
> Anything can oops in combination with enough stacked device drivers
> (although there has been block layer work to resolve this recently,
> so you should try again with a current kernel...).  If you have an
> actual example of this still happening, please open a bug or at least
> let the XFS developers know of your test case.  Thanks.

That was actually part of the problem.  There was no time, and no hardware, 
to try to reproduce the problem in the lab.  This isn't an XFS problem 
specifically, this is an Open Source problem really....If you encounter a 
bug, and you're unlucky enough to be a bit of an edge case, you better be 
prepared to pony up with hardware and mantime to diagnose and reproduce it 
or it might not get fixed.  Again though, this is common to the whole open 
source community, and not XFS, Linux, LVM, or any other project specific.

Having said that, if you can reproduce it, and get good details, the open 
source community has a far better track record of *really* fixing and 
addressing bugs than any commercial software.

>
>> We also have had
>> problems with the quota system but the details on that have faded.
>
> Seems like details of all the problems you described have faded.
> Your mail seems to me like a bit of a troll ... I guess you had a
> problem or two a couple of years ago (from searching the lists)
> and are still sore.  Can you point me to mailing list reports of
> the problems you're refering to here or bug reports you've opened
> for these issues?  I'll let you know if any of them are still
> relevent.

No, we had dozens actually.  The only ones that were really crippling were 
when XFS would suddenly unmount in the middle of the business day for no 
apparent reason.  Without details bug reports are ignored, and we couldn't 
really provide details or filesystem dumps because there was too much data, 
and we had to get it back online.  We just moved as fast as we could away 
from XFS.  It wasn't just a one day thing, or a week, there was a trail of 
crashes with XFS at the time.  Sometimes the machine was so locked up from 
XFS pulling the rug out that the console was wedged up pretty badly too.

I wanted to provide the information as a data point from the other side as 
it were not get into a pissing match with the XFS developers and community. 
XFS is still young, as is ReiserFS.  and while Reiser is a completely new 
FS and XFS has roots in IRIX and other implementations, their age is 
similar since XFS' Linux implementation is around the same age.  If the 
state has change in the last 6-12 months then so much the better.  The 
facts are that XFS during operation had many problems, and as we pulled it 
out still had many unresolved problems as we replaced it with ReiserFS. 
And Reiser has been flawless except for one problem already mentioned on 
Linux-LVM very clearly caused by an external SAN/RAID problem which EMC has 
corrected (completely as an aside -- anyone running a CX series REALLY 
needs to be on the latest code rev, you might never run into the bug, and 
i'm still not sure exactly which one we hit, there were atleast two that 
could have caused the data corruption, but if you do, it can be ugly).


The best guess that I have as to why we had such a bad time with XFS was 
the XFS+NFS interaction and possibly an old (unknown to me -- this is just 
a guess) bug that may have created some minor underlying corruption that 
the repair tools couldn't fully fix or diagnose may have caused our 
continual (seemingly random) problems.  I don't believe in really random 
problems, atleast not in computers anyway.

>
> cheers.
>
> --
> Nathan
>



--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler




More information about the linux-lvm mailing list