[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] NFS on GFS architectural issues / problems

Wendy Cheng wrote:
Riaan van Niekerk wrote:

My question to you or anyone who is familiar with NFS on GFS, or GFS in general, which of the following are still valid issues for the current (6.1u4) version of GFS. If all or most of them still apply, I can use this as motivation for my customer to strongly consider going off NFS on GFS. Removing the NFS from our GFS cluster has been on the cards for quite a while, but has not gained momentum due to lack of information on the performance gains of such a move (very difficult to gage) or the architectural problems/limitations of NFS on GFS (for which the following extract is spot-on).

These have been worked on and some of them do have test patches ready to address the issues. However, the changes are non-trivial and may involve base kernel modifiction that we need to get upstream (community linux kernel) acceptance. The efforts take time since we would like to do it conservatively to preserve GFS1/2 stability. Unless the posted problems have urgent needs (let us know), the current NFS-GFS development focus is on failover (Red Hat bugzilla 132823).

Is performance the primary concern you have now ?

-- Wendy

Yes, mostly. We have a couple of open service requests for stability. They are very intermittent and not reproduceable (and nothing in bugzilla seems to match):

a) load average on nodes steadily climbs until load average reaches the nfsd count, upon which all I/O hangs. We reboot nodes one by one, and as soon as the one with a stuck lock is bounced, I/O returns to all nodes)

b) kernel oopses with Assertion failed on line 428 / 357 of dlm/lock.c while there is no load on the system . this happens 3 days in a row, over a weekend, and then for weeks, the error does not occur again.

getting the info that upport requires (sysrq t, lockdump, etc, on all nodes, crashdump on failing node, is pretty difficult). We are not married to NFS on GFS, even though it is a cost-effective interim step for until we can get all our mail servers (14 in all) SAN-attached.

Can I read into "have been worked on" and "some do have test patches" that these 4 issues still persist? I need the ammunition to motivate the move away from NFS on GFS. this architecture document gives it to me if these issues are still valid.


fn:Riaan van Niekerk
n:van Niekerk;Riaan
org:Obsidian Systems;Obsidian Red Hat Consulting
email;internet:riaan obsidian co za
title:Systems Architect
tel;work:+27 11 792 6500
tel;fax:+27 11 792 6522
tel;cell:+27 82 921 8768

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]