[Linux-cluster] GFS 6.0 node without quorum tries to fence

AJ Lewis alewis at redhat.com
Wed Aug 4 13:54:20 UTC 2004


On Wed, Aug 04, 2004 at 08:12:51AM +0200, Schumacher, Bernd wrote:
> So, what I have learned from all answers is very bad news for me. It
> seems, what happened is as expected by most of you. But this means:
> 
> -----------------------------------------------------------------------
> --- One single point of failure in one node can stop the whole gfs. ---
> -----------------------------------------------------------------------
> 
> The single point of failure is:
> The lancard specified in "nodes.ccs:ip_interfaces" stops working on one
> node. No matter if this node was master or slave.
> 
> The whole gfs is stopped:
> The rest of the cluster seems to need time to form a new cluster. The
> bad node does not need so much time for switching to arbitrary mode. So
> the bad node has enough time to fence all other nodes, before it would
> be fenced by the new master.
> 
> The bad node lives but it can not form a cluster. GFS is not working.
> 
> Now all other nodes will reboot. But after reboot they can not join the
> cluster, because they can not contact the bad node. The lancard is still
> broken. GFS is not working.
> 
> Did I miss something?
> Please tell me that I am wrong!

Well, I guess I'm confused how the node with the bad lan card can contact
the fencing device to fence the other nodes.  If it can't communicate with
the other nodes because it's NIC is down, it can't contact the fencing
device over that NIC either, right?  Or are you using some alternate
transport to contact the fencing device? 
 
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> > Schumacher, Bernd
> > Sent: Dienstag, 3. August 2004 13:56
> > To: linux-cluster at redhat.com
> > Subject: [Linux-cluster] GFS 6.0 node without quorum tries to fence
> > 
> > 
> > Hi,
> > I have three nodes oben, mitte and unten. 
> > 
> > Test:
> > I have disabled eth0 on mitte, so that mitte will be excluded. 
> > 
> > Result:
> > Oben and unten are trying to fence mitte and build a new 
> > cluster. OK! But mitte tries to fence oben and unten. PROBLEM!
> >  
> > Why can this happen? Mitte knows that it can not build a 
> > cluster. See Logfile from mitte: "Have 1, need 2"
> > 
> > Logfile from mitte:
> > Aug  3 12:53:17 mitte lock_gulmd_core[1845]: Client (oben) 
> > expired Aug 3 12:53:17 mitte lock_gulmd_core[1845]: Core lost 
> > slave quorum. Have 1, need 2. Switching to Arbitrating. Aug  
> > 3 12:53:17 mitte
> > lock_gulmd_core[2120]: Gonna exec fence_node oben Aug  3 
> > 12:53:17 mitte
> > lock_gulmd_core[1845]: Forked [2120] fence_node oben with a 0 
> > pause. Aug 3 12:53:17 mitte fence_node[2120]: Performing 
> > fence method, manual, on oben. 
> > 
> > cluster.ccs:
> > cluster {
> >     name = "tom"
> >     lock_gulm {
> >         servers = ["oben", "mitte", "unten"]
> >     }
> > }
> > 
> > fence.ccs:
> > fence_devices {
> >   manual_oben {
> >     agent = "fence_manual"
> >   }     
> >   manual_mitte ...
> > 
> > 
> > nodes.ccs:
> > nodes {
> >   oben {
> >     ip_interfaces {
> >       eth0 = "192.168.100.241"
> >     }
> >     fence { 
> >       manual {
> >         manual_oben {
> >           ipaddr = "192.168.100.241"
> >         }
> >       }
> >     }
> >   }
> >   mitte ...
> > 
> > regards
> > Bernd Schumacher
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com 
> > http://www.redhat.com/mailman/listinfo/linux-> cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
AJ Lewis                                   Voice:  612-638-0500
Red Hat Inc.                               E-Mail: alewis at redhat.com
720 Washington Ave. SE, Suite 200
Minneapolis, MN 55414

Current GPG fingerprint = D9F8 EDCE 4242 855F A03D  9B63 F50C 54A8 578C 8715
Grab the key at: http://people.redhat.com/alewis/gpg.html or one of the
many keyservers out there...
-----Begin Obligatory Humorous Quote----------------------------------------
"In this time of war against Osama bin Laden and the oppressive
Taliban regime, we are thankful that OUR leader isn't the spoiled son
of a powerful politician from a wealthy oil family who is supported by
religious fundamentalists, operates through clandestine organizations,
has no respect for the democratic electoral process, bombs innocents,
and uses war to deny people their civil liberties." --The Boondocks
-----End Obligatory Humorous Quote------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20040804/bc7c79fc/attachment.sig>


More information about the Linux-cluster mailing list