[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] GFS2 locking in a VM based cluster (KVM)


sorry guys to resurrect an old thread, but I have to say I can confirm that, too. I have a libvirt setup with multipathed FC SAN devices and KVM guests running on top of it. The physical machine is HP 465c G7 (2 x 12 Core Magny-Cours with 96GB RAM). The host OS is Fedora 14. The guests are Scientific Linux 6. With gfs2 10GB shared LUN I can manage ~600k plocks/sec while both machines mounted the LUN. I started: ping_pong some_file 3 on one of the VMs and got those 600k plocks. Then I started ping_pong the_same_file 3 on the second machines and got around 360 plocks/sec (that is 360, not 360 000). No matter what I tried I couldn't optimize it. If I stop the ping_pong on one of the VMs the plocks wen't up to around 500-550 plocks/sec (again 550 not 550k). Stopping the process. Waiting a while and starting again on a single machine still got me around 600k plocks. This I could reproduce both with tcp and sctp and tried bunch of different settings.

Then I decided to give ocfs2 a change. Compiling the module on SL6, and I suppose on RHEL6, is not the most straight forward taks, buth half an hour later I got the module compiled from the sources of the EL kernel. Stripped all debug symbols. Copied the ocfs2 kernel module dir to both VM machines. Did depmod -a, I set up the oracle fs on top of the same LUN. Used ping_pong the_same_file_i_used_in_the_first_test 3 on just one machine, while both VMs have mounted the LUN. 1600k plocks/sec (as in ~1 600 000 ). Started ping_pong on the second host. The plocks did not move at all. Still 1600k plocks/sec. Tested with the real life app. It worked very well, unlike gfs2, which was painfully slow with just 2 users. I created the ocfs2 with -T mail, I didn't do any tuning on it, either.

I'm not trying to bash gfs2, actually I would definitely prefer it over ocfs2 anytime, however it seems it doesn't work well with VM for some reason. I have used both mtu 1500 and 9000 also, it just didn't make any diffence, no matter what I have tried.I haven't tested the same setup on top of two physical nodes, but I have the feeling it will work just as good as ocfs2 on the VMs. I didn't test with hugepages for the VMs, but I somehow doubt that would make much of a difference.

I think this should be investigates by someone at RH possibly because they are the driving force behind both KVM, libvirt, the cluster soft and gfs2.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]