[dm-devel] failover time and failback time

Sat Aug 26 18:39:02 UTC 2006

On Sat, 2006-08-26 at 14:06 -0400, seth vidal wrote:
> On Sat, 2006-08-26 at 13:35 -0400, seth vidal wrote:
> > Hi, 
> 
> 
> <snip>
> > Then I yank one connection on one of the cards in the back of the
> > system.
> > I watch dmesg and I see:
> > qla2300 0000:03:0b.0: LOOP DOWN detected (2).
> > 
> > At this point I would expect multipathd to fail out the paths connected
> > and continue happily. 
> > 
> 
> So, I think I know why multipathd was failing back correctly :)
> 
> It's because it wasn't running. I thought it was but I was wrong.
> 
> However, now I'm seeing this when it tries to failover:
> Aug 26 14:04:10 multipathd: error calling out /sbin/mpath_prio_alua
> 8:240
> Aug 26 14:04:10 kernel: SCSI error : <1 0 3 3> return code = 0x10000
> 
> I've checked /sbin/mpath_prio_alua works to run - so I'm not sure where
> I should look next.

It's so fun learning things in semi-public :)

This is calling to verify the path. It continues to do this until the
path is restored.

Now - is there  any way to tell multipath: "yes, we know, it's down,
stop trying for now b/c it isn't going to be back"

Sort of like acknowledging an alert in nagios.

I can think of some controlled 'failures' where I might want to tell it
to be quiet.

Thanks for putting up with my messages. :)
-sv