[Linux-cluster] 4U5 CSS/CMAN/fence quorum confusion

Patrick Caulfield pcaulfie at redhat.com
Mon Jun 11 07:53:26 UTC 2007


Patrick Caulfield wrote:
> Robert Clark wrote:
>> On Fri, 2007-06-08 at 16:20 +0100, Robert Clark wrote:
>>> On Fri, 2007-06-08 at 13:04 +0100, Patrick Caulfield wrote:
>>>> Robert Clark wrote:
>>>>>   Does anyone know what might cause ccsd to continue to refuse
>>>>> connections for a lack of quorum after cman has decided the cluster is
>>>>> quorate?
>>>> The usual cause of this is the magma plugins either not being installed in the
>>>> right place or even at all. "magma_tool list" will show you which plugins are
>>>> installed, for CMAN you need the magma_sm.so plugin.
>>>   Thanks for the quick reply. I put "magma_tool list" into the script
>>> just before and after trying to start fenced. The output both times is:
>>>
>>> Magma: Checking plugins in /lib/magma
>>>
>>> File            Status  Message
>>> ----            ------  -------
>>> magma_gulm.so   [OK]    GuLM Plugin v1.0.5
>>> magma_sm.so     [OK]    CMAN/SM Plugin v1.1.7.4
>>>
>>> Magma: 2 plugins available
>>>
>>>   When I added "magma_tool quorum" as well, it reported "Connect
>>> failure: No cluster running?".
>>   I've managed to get an strace of ccsd during the boot and it turned up
>> some interesting lines, which I've interspersed with selected log
>> entries:
>>
>> Jun  8 22:20:27 localhost ccsd[2981]: Starting ccsd 1.0.10: 
>> Jun  8 22:20:27 localhost kernel: CMAN 2.6.9-50.2 (built May 31 2007 15:39:24) installed
>> Jun  8 22:20:27 localhost kernel: NET: Registered protocol family 30
>> Jun  8 22:20:27 localhost ccsd[2981]:  Built: May 31 2007 15:48:09 
>> Jun  8 22:20:27 localhost ccsd[2981]:  Copyright (C) Red Hat, Inc.  2004  All rights reserved. 
>> Jun  8 22:20:27 localhost kernel: DLM 2.6.9-46.16 (built May 31 2007 15:45:51) installed
>> Jun  8 22:20:28 localhost ccsd[2981]: cluster.conf (cluster name = defuturo_test, version = 2) found. 
>> Jun  8 22:20:28 localhost kernel: CMAN: Waiting to join or form a Linux-cluster
>> 2990  22:20:28 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> Jun  8 22:20:29 localhost kernel: CMAN: sending membership request
>> 2990  22:20:29 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> Jun  8 22:20:30 localhost kernel: CMAN: got node tamarillo
>> Jun  8 22:20:30 localhost kernel: CMAN: got node guava
>> Jun  8 22:20:30 localhost kernel: CMAN: quorum regained, resuming activity
>> Jun  8 22:20:30 localhost cman: startup succeeded
>> 2990  22:20:30 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:31 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:32 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:33 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:34 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:35 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:36 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:37 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:38 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:39 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:40 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:41 stat64("/dev/dlm-control", 0xf6f2c104) = -1 ENOENT (No such file or directory)
>> 2990  22:20:42 stat64("/dev/dlm-control", {st_mode=S_IFCHR|0600, st_rdev=makedev(10, 62), ...}) = 0
>> Jun  8 22:20:42 kiwano ccsd[2981]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.4
>> Jun  8 22:20:42 kiwano ccsd[2981]: Initial status:: Quorate
>>
>> So, it looks like the problem is that the appearance of /dev/dlm-control
>> is being delayed in the 4U5 cluster.
> 
> 
> That's wrong - it should be created in /dev/misc/dlm-control - maybe you are
> missing the udev config file that puts in the right place ? you should have
> /etc/udev/rules.d/51-dlm.rules file that does this.
> 

Oops got that wrong way round - dlm-control probably IS being created in the
right place, it's magma that is looking for it in the wrong place. I don't have
the magma version numbers to hand, but I'm pretty sure this was fixed in CVS and
would be worried if it didn't get into U5.

-- 
Patrick

Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street,
Windsor, Berkshire, SL4 ITE, UK.
Registered in England and Wales under Company Registration No. 3798903




More information about the Linux-cluster mailing list