[Linux-cluster] dlm and IO speed problem <er, might wanna get a coffee first ; )>

Fri Apr 11 15:28:37 UTC 2008

christopher barry wrote:
> On Tue, 2008-04-08 at 09:37 -0500, Wendy Cheng wrote:
>   
>> gordan at bobich.net wrote:
>>     
>>>       
>>>> my setup:
>>>> 6 rh4.5 nodes, gfs1 v6.1, behind redundant LVS directors. I know it's
>>>> not new stuff, but corporate standards dictated the rev of rhat.
>>>>         
>>> [...]
>>>       
>>>> I'm noticing huge differences in compile times - or any home file access
>>>> really - when doing stuff in the same home directory on the gfs on
>>>> different nodes. For instance, the same compile on one node is ~12
>>>> minutes - on another it's 18 minutes or more (not running concurrently).
>>>> I'm also seeing weird random pauses in writes, like saving a file in vi,
>>>> what would normally take less than a second, may take up to 10 seconds.
>>>>         
>
> Anyway, thought I would re-connect to you all and let you know how this
> worked out. We ended up scrapping gfs. Not because it's not a great fs,
> but because I was using it in a way that was playing to it's weak
> points. I had a lot of time and energy invested in it, and it was hard
> to let it go. Turns out that connecting to the NetApp filer via nfs is
> faster for this workload. I couldn't believe it either, as my bonnie and
> dd type tests showed gfs to be faster. But for the use case of large
> sets of very small files, and lots of stats going on, gfs simply cannot
> compete with NetApp's nfs implementation. GFS is an excellent fs, and it
> has it's place in the landscape - but for a development build system,
> the NetApp is simply phenomenal.
>   

Assuming you run both configurations (nfs-wafl vs. gfs-san) on the very 
same netapp box (?) ...

Both configurations have their pros and cons. The wafl-nfs runs on 
native mode that certainly has its advantages - you've made a good 
choice but the latter (gfs-on-netapp san) can work well in other 
situations. The biggest problem with your original configuration is the 
load-balancer. The round-robin (and its variants) scheduling will not 
work well if you have a write intensive workload that needs to fight for 
locks between multiple GFS nodes. IIRC, there are gfs customers running 
on build-compile development environment. They normally assign groups of 
users on different GFS nodes, say user id starting with a-e on node 1, 
f-j on node2, etc.

One encouraging news from this email is gfs-netapp-san runs well on 
bonnie. GFS1 has been struggling with bonnie (large amount of smaller 
files within one single node) for a very long time. One of the reasons 
is its block allocation tends to get spread across the disk whenever 
there are resource group contentions. It is very difficult for linux IO 
scheduler to merge these blocks within one single server. When the 
workload becomes IO-bound, the locks are subsequently stalled and 
everything start to snow-ball after that. Netapp SAN has one more layer 
of block allocation indirection within its firmware and its write speed 
is "phenomenal" (I'm borrowing your words ;) ), mostly to do with the 
NVRAM where it can aggressively cache write data - this helps GFS to 
relieve its small file issue quite well.

-- Wendy