[Linux-cluster] Problem with fenced on cluster with2 BladeCentermachines:1st machine is remove physically. The remaining one doesnot becameActive (waiting for fenced)

Thu Jul 12 16:24:29 UTC 2007

How do I submit a ticket? Call RH support? 

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of James Parsons
Sent: Thursday, July 12, 2007 1:14 PM
To: linux clustering
Subject: Re: [Linux-cluster] Problem with fenced on cluster with2
BladeCentermachines:1st machine is remove physically. The remaining one
doesnot becameActive (waiting for fenced)

Thistle, Scott wrote:

>I am having the same issue. If a blade is not present (i.e. removed for

>maintenance), the fence_bladecenter cannot check the state as it is 
>reported empty. I think it is something simple to fix for those versed 
>in perl. Normally the fence only runs against a blade that is present.
>If the blade is removed while running, you run into this issue.
>
I believe this is what you want to happen...if state cannot be checked,
fenced keeps trying. How could you determine it was safe to stop without
persisting some value like the number of fence tries, and trying to
reason out whether it was safe to stop? This will not happen if you
remove the blade from the cluster before physically removing it. It is a
snap to do this  with one of the UIs, if you are not prejudiced against
UIs :).

Also, removing the node from cluster membership before jerking it out of
the rack tells rgmanager to move any services off of it  - rather than
having to depend on heartbeat failure to make this happen.

That said, if the blade catches fire and a cage IT guy notices and jerks
it quick, (using his IT Oven Mitt, of course) it is silly for fenced to
keep incessantly trying when the thing no longer even exists. Perhaps
the correct solution would be to have the fence_bladecenter report
success if the bladecenter admin unit reports that 'no status is
available' for a particular blade - obviously if the thing is not there,
it should be safe to say it is fenced :)

If this addresses your situation (I think it does), now would be a
REALLY good time to file a ticket requesting this behavior - like today!
I'll post a fixed version to the ticket when it is ready.

Thanks to Lon for discussing this with me...;)

Regards,

-Jim

>
>My case below. Blade #3 is a good node. Blade #2 was removed. The fence

>does not work with the blade removed.
>
>system> env -T system:blade[3]
>OK
>system:blade[3]> power -state
>On
>system:blade[3]> env -T system:blade[2] The target bay is empty.
>system:blade[3]> env -T system:blade[1] OK system:blade[1]>
>
>-----Original Message-----
>From: linux-cluster-bounces at redhat.com
>[mailto:linux-cluster-bounces at redhat.com] On Behalf Of James Parsons
>Sent: Thursday, July 12, 2007 12:33 PM
>To: linux clustering
>Subject: Re: [Linux-cluster] Problem with fenced on cluster with 2
>BladeCentermachines: 1st machine is remove physically. The remaining 
>one doesnot became Active (waiting for fenced)
>
>catalin.lupescu at bull.net wrote:
>
>  
>
>>Hello!
>>
>>I have a Cluster Redhat made with 2 nodes IBM blades on Blade Center 
>>chassis.
>>(fenced version 1.32.6)
>>
>>I have done the following test:
>>I have removed physically the node 1 machine (the Active one).
>>The second one is never became active one. "Clustat" command does not 
>>printing any information.
>>In /var/log/messages we can found the following messages (repeated):
>>
>>Jul 11 17:46:24 cdrc1-2 fenced[4214]: fencing node "cdrc1-1"
>>Jul 11 17:46:38 cdrc1-2 fenced[4214]: agent "fence_bladecenter" 
>>reports: pattern match timed-out at /sbin/fence_bladecenter line 185 
>>Jul 11 17:46:38 cdrc1-2 fenced[4214]: fence "cdrc1-1" failed
>>
>>If the node 1 is plugged, the node 2 became Active (fenced OK)
>>
>>    
>>
>bz#240509 changed the sleep timeout in the bladecenter agent from 5 to 
>10...this is on or about line 193 in /sbin/fence_bladecenter.  See what

>yours is set at, and try pushing it out a bit. This minor change is 
>making its way through the distribution chain now.
>
>-j
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster