[Linux-cluster] Problem with rgmanager / rgmanager #37: Error receiving header from 2 sz=0 CTX 0x1f5d420

Digimer lists at alteeve.ca
Fri Sep 21 14:52:58 UTC 2012


On 09/21/2012 06:39 AM, Ralf Aumueller wrote:
> On 09/20/2012 07:54 PM, Digimer wrote:
>> On 09/20/2012 12:21 PM, Ralf Aumueller wrote:
>>> Hello,
>>>
>>> we have a two node CentOS6.2 Cluster (rgmanager-3.0.12.1-5). After a reboot of
>>> node2 the cluster won't work as expected. On node2 clustat just say's :
>>>
>>> clustat:
>>> Cluster Status for cluster1 @ Thu Sep 20 17:06:02 2012
>>> Member Status: Quorate
>>>
>>>    Member Name                                                 ID   Status
>>>    ------ ----                                                 ---- ------
>>>    node1                                                       1 Online
>>>    node2                                                       2 Online, Local
>>>
>>> No services listed, no rgmanager running. Also it is not possible to
>>> start/migrate any services to node2.
>>>
>>> On node1 a clustat lists all configured services + under Status rgmanager on
>>> both nodes. On node1 the rgmanager.log has lots of:
>>> rgmanager #37: Error receiving header from 2 sz=0 CTX 0x1XXXXXX
>>>
>>> On node2 the rgmanager.log gives me:
>>> rgmanager #34: Cannot get status for service ...
>>>
>>> I did not change the cluster.conf. Only change on node2 was: +48MB and an new
>>> BIOS version -- recommend by Dell Support).
>>>
>>> Best regards,
>>> Ralf
>>>
>>
>> Sounds like you hit this bug:
>> http://rhn.redhat.com/errata/RHBA-2012-0897.html
>>
>> Update rgmanager to rgmanager-3.0.12.1-12 and you should be ok.
>
> Did an update of rgmanager on both nodes. Just stopping/starting the
> cluster-services didn't revolve the problem. A shutdown of both nodes an then a
> restart solves the problem.
>
> Thanks and best regards,
> Ralf

If I recall correctly, I had to also reboot after the update was 
applied. Now that I've been able to remember better, I think this was 
caused by the leep second some time back. That leep second hit a lot of 
programs, and I believe this includes stuff in the kernel itself.

Glad it's resolved!

-- 
Digimer
Papers and Projects: https://alteeve.ca




More information about the Linux-cluster mailing list