[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] qdisk questions



Hi,

I have recently had a couple of situations with my cluster where both
nodes were restarted simultaneously. The reasons for this are a bit
beyond me so I was wondering if anyone could clarify / point me to
relevant documentation.

Following excerpts from both nodes logs :

Oct  2 08:32:22 node1 qdiskd[3758]: <info> Heuristic: 'ping 10.X.X.X -c1
-t2' DOWN (3/3)
Oct  2 08:32:39 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:55 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:58 node1 qdiskd[3758]: <info> Heuristic: 'ping X.X.X.X -c1
-t1' DOWN (6/6)
Oct  2 08:33:01 node1 qdiskd[3758]: <notice> Score insufficient for
master operation (0/4; required=1); downgrading
Oct  2 08:33:01 node1 kernel: md: stopping all md devices.

Oct  2 08:32:23 node2 qdiskd[3599]: <info> Heuristic: 'ping 10.X.X.X -c1
-t2' DOWN (3/3)
Oct  2 08:32:49 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:32:56 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t1' DOWN (6/6)
Oct  2 08:32:56 node2 qdiskd[3599]: <info> Heuristic: 'ping X.X.X.X -c1
-t2' DOWN (6/6)
Oct  2 08:33:03 node2 qdiskd[3599]: <notice> Score insufficient for
master operation (0/4; required=1); downgrading
Oct  2 08:33:03 node2 kernel: md: stopping all md devices.


Does qdisk reboot the node due to these tests failing?

The upstream routers these nodes are connected to were unavailable for
at most 2 minutes, and all four pingtests require connectivity through
the router (probably need to change that!?).

What kind of tests can I use for qdiskd that will prevent router-outages
 from killing my cluster completely?


Regards
--
Denis


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]