[Linux-cluster] A better understanding of multicast issues

Sat Feb 12 16:21:44 UTC 2011

On 02/12/2011 05:51 AM, Kit Gerrits wrote:
> 
> Digimer,
> 
> Did you ever get a reply from anyone?
> 
> If what you say is true, failure of one of our HSRP(HA) switches/routers
> might break the cluster.
> (if they don't share multicast menberships)
> 
> I would guess that  multicast groups originate in the cluster, not the
> switch.
> In that case, if the switch has been rebooted, the cluster needs to
> re-create the multicast groups on the switch.
> 
> I would guess that the cluster itself needs to check if the switch is
> properly handling multicast.
> (subscribe to its own group and check if the packets are being handles
> correctly)
> 
> This should provide an insight into clustering/multicast:
> http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note0918
> 6a008059a9df.shtml
> 
> 
> Regards,
> 
> Kit

Hi Kit,

  I did not, and thank you for replying.

  So the frequent multicast breakdowns, given that it's fairly rare for
switches to reset, is probably in the periodic checks done by the
switches. I wonder then if corosync, for whatever reasons, doesn't or
isn't able to answer the requests (quickly enough). Perhaps the process
takes too much time? Corosync will, by default, decare a ring dead after
~3s.

  More to think about, and I appreciate that link. Thanks. :)

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org