[Linux-cluster] "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" in /var/log/messages

Bernard Chew bernardchew at gmail.com
Fri Apr 9 08:51:52 UTC 2010


> On Thu, Apr 8, 2010 at 12:58 AM, Steven Dake <sdake at redhat.com> wrote:
> On Wed, 2010-04-07 at 18:52 +0800, Bernard Chew wrote:
>> Hi all,
>>
>> I noticed "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" repeated
>> every few hours in /var/log/messages. What does the message mean and
>> is it normal? Will this cause fencing to take place eventually?
>>
> This means your network environment dropped packets and totem is
> recovering them.  This is normal operation, and in future versions such
> as corosync no notification is printed when recovery takes place.
>
> There is a bug, however, fixed in revision 2122 where if the last packet
> in the order is lost, and no new packets are unlost after it, the
> processor will enter a failed to receive state and trigger fencing.
>
> Regards
> -steve
>> Thank you in advance.
>>
>> Regards,
>> Bernard Chew
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

Thank you for the reply Steve!

The cluster was running fine until last week where 3 nodes restarted
suddenly. I suspect fencing took place since all 3 servers restarted
at the same time but I couldn't find any fence related entries in the
log. I am guessing we hit the bug you mentioned? Will the log indicate
fencing has taken place with regards to the bug you mentioned?

Also I noticed the message "kernel: clustat[28328]: segfault at
0000000000000024 rip 0000003b31c75bc0 rsp 00007fff955cb098 error 4"
occasionally; is this related to the TOTEM message or they indicate
another problem?

Regards,
Bernard Chew




More information about the Linux-cluster mailing list