Mike Christie wrote:
For starters we just should send a netlink event when fast_fail_io has fired. We could easily integrate that one in multipathd and would gain an instant benefit from that as we can switch paths in advance. Next step would be to implement an additional sdev state which would return 'DID_TRANSPORT_FASTFAIL' for any 'normal' I/O; it would be inserted between 'RUNNING' and 'CANCEL'. Transition would be possible between 'RUNNING' and 'FASTFAIL', but it would only be possible to transition into 'CANCEL' from 'FASTFAIL'.Yeah, a new sdev state might be nice. Right now this state is handled by the classes. For iscsi and FC the port/session will be in blocked/ISCSI_SESSION_FAILED. Then internally the classes are decieding what to do with IO in the *_chkready functions.
How about setting the device to the offline state for this case where fast_io_fail has fired but the dev_loss_tmo has not yet fired? As fast as failing IO we get the same result. scsi-ml would fail the incoming IO instead of it getting to the class _chkready functions, but the scsi device state indicates that it cannot execute IO which might be nice for users.
Can we not do this because offline for the device only means when the scsi-eh has put it offline because it could not recover it or is it more generic like for any time it cannot execute IO?