[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] NTP time steps causes cluster reconfiguration



Hi,

 

During testing, I noticed that a time step caused by ntpd caused the cluster to drop into GATHER state:

 

Jun 16 12:13:16 cp1edidbm001 ntpd[30917]: time reset -16.332117 s

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering GATHER state from 12.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Creating commit token because I am the rep.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Saving state aru 9e high seq received 9e

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] Storing new sequence id for ring 328

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering COMMIT state.

Jun 16 12:13:26 cp1edidbm001 openais[15929]: [TOTEM] entering RECOVERY state.

...

 

This is easily repeatable through setting the clock forwards by 20 seconds using /bin/date.  This probably causes comms timeouts to expire prematurely, and almost every time causes the cluster to reconfigure - luckily without affecting running services.

 

Stepping the clock backwards also causes a similar disruption, but there is a long lag between changing the time and the cluster reconfiguring:  perhaps this extends a timeout or sleep on the affected node, causing genuine timeouts on the other nodes.

 

All I am looking for is some reassurance that clock changes are not going to crash the cluster.  Is anyone able to confirm this please ?

 

regards,

Martin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]