RE: [Linux-cluster] Fenced node never reboots properly

What type of vmware environment?  (VI ESX 3, Server, Workstation, or one of the older platforms?)


The Vmware forums have a fair amount of help on how to handle clock drift.  Are you on AMD or Intel, 32 or 64 bit?


From: Jeroen van den Horn
Sent: Friday, March 30, 2007 10:07 AM
To: linux clustering
Subject: Re: [Linux-cluster] Fenced node never reboots properly


In response to Lon's suggestion I modified the fence_vmware code and set the type of reset to HARD - cluster node now resets properly. Remaining issue is that under VMWare we are still experiencing performance issues. It's as if a node in the cluster starts 'lagging behind' (also the system clock starts drifting) and that after some time one of the nodes declares the other dead.

Does anybody have any pointers towards performance issues and/or clock drifting with GFS on virtual machines?


I'm using fence_vmware which I downloaded from some CVS repository. Good to hear that that is the issue - I'll take a look at the source and see whether the VMWare API support some sort of 'hard reset'.


Lon Hohberger wrote:

On Thu, Mar 29, 2007 at 10:04:00AM +0200, Jeroen van den Horn wrote:
However during shutdown node 2 executes /etc/rc6.d/S31umountnfs (it's a 
Debian system) which also attempts to unmount the GFS disk - result: 
kernel OOPS. The system continues shutdown until it says 'Will now 
restart.' but that's the end of it. I've tried setting the 
/proc/sys/kernel/panic and added 'panic=5' to the kernel boot options 
but to no avail.
I'm really at a loss here - does anybody have any suggestions on how to 
solve this problem?
Yes, it's supposed to be killed (immediately) when fenced, not
gracefully attempting to shut down.  What fencing agent are you using?
It sounds like there's a bug.
-- Lon


