[dm-devel] [PATCH v2][SCSI] scsi_dh: propagate SCSI device deletion
Mike Snitzer
snitzer at redhat.com
Thu Dec 16 15:29:31 UTC 2010
On Thu, Dec 16 2010 at 9:21am -0500,
Menny_Hamburger at Dell.com <Menny_Hamburger at Dell.com> wrote:
> Mike,
>
> What I meant is that removing the dm-mpath change from the patch makes
> the operation path go into an area that I did not test (the default
> case).
> I know it's only a !hw_handler_name and a printk (as I said before -
> this might sound superstitious), however I did my tests with the
> dm-mpath part, and I prefer to send a 100% tested patch then to add
> something I did not test.
> My suggestion was to supply a scsi_dh only patch that goes through the
> same operation path as the one I tested - via the SCSI_DH_NOSYS case
> in pg_init_done. In addition, the separation of offline from
> cancel/del also seems the correct way to do this.
I see, my misunderstanding.
- But you're still returning SCSI_DH_DEV_OFFLINED; so if dm mpath's
pg_init_done isn't updated to explicitly handle it it'll just
fail_path() via the default case. Are you OK with that?
- In your initial patch you said: "When running an upstream kernel,
the above scenario may not occur because the request queue is aborted
when the multipath fails the path."
I'm missing _why_ fail_path's call to blk_abort_queue() would obviate
the need for this patch. blk_abort_queue() is only called from
fail_path(). But I thought the problem is fail_path() isn't called in
this case where the device is SCSI_DH_NOSYS or SCSI_DH_DEV_OFFLINED?
-- due to the missing call to activate_complete callback (pg_init_done)
(again, I'm very interested in this because we'll be reverting DM
mpath's call to blk_abort_queue in the near future).
But this would be the revised scsi_dh patch (I'm not sending to
linux-scsi until I have an answer for the above concerns):
From: Menny Hamburger <Menny_Hamburger at Dell.com>
When the scsi_dh_activate returns SCSI_DH_NOSYS the activate_complete
callback is not called and the error is not propagated to DM mpath.
When a SCSI device attached to a device handler is deleted, userland
processes currently performing I/O on the device will have their I/O
hang forever.
- Set SCSI_DH_NOSYS error when the handler is in the process of being
deleted (e.g. the SCSI device is in a SDEV_CANCEL or SDEV_DEL state).
- Set SCSI_DH_DEV_OFFLINED error when device is in SDEV_OFFLINE state.
- Call the activate_complete callback function directly from
scsi_dh_activate if an error has been set (when either the scsi_dh
internal data has already been deleted or is in the process of being
deleted).
The patch was tested in an ISCSI environment, RDAC H/W handler and
multipath.
In the following reproduction process, dd will I/O hang forever and the
only way to release it will be to reboot the machine:
1) Perform I/O on a multipath device:
dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
2) Delete all slave SCSI devices contained in the mpath device:
I) In an ISCSI environment, the easiest way to do this is by
stopping ISCSI:
/etc/init.d/iscsi stop
II) Another way to delete the devices is by applying the following
bash scriptlet:
dm_devs=$(ls /sys/block/ | grep dm- | xargs)
for dm_dev in $dm_devs; do
devices=$(ls /sys/block/$dm_dev/slaves)
for device in $devices; do
echo 1 > /sys/block/$device/device/delete
done
done
NOTE: when DM mpath uses blk_abort_queue this scsi_dh change isn't
required. However, DM mpath's call to blk_abort_queue has proven to be
unsafe due to a race (between blk_abort_queue and scsi_request_fn) that
can lead to list corruption. Therefore we cannot rely on
blk_abort_queue (dm mpath's blk_abort_queue call will be reverted and
will only be restored once the race with scsi_request_fn is fixed).
Signed-off-by: Menny Hamburger <Menny_Hamburger at Dell.com>
Signed-off-by: Mike Snitzer <snitzer at redhat.com>
---
drivers/scsi/device_handler/scsi_dh.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
index 6fae3d2..b0c56f6 100644
--- a/drivers/scsi/device_handler/scsi_dh.c
+++ b/drivers/scsi/device_handler/scsi_dh.c
@@ -442,12 +442,19 @@ int scsi_dh_activate(struct request_queue *q, activate_complete fn, void *data)
sdev = q->queuedata;
if (sdev && sdev->scsi_dh_data)
scsi_dh = sdev->scsi_dh_data->scsi_dh;
- if (!scsi_dh || !get_device(&sdev->sdev_gendev))
+ if (!scsi_dh || !get_device(&sdev->sdev_gendev) ||
+ sdev->sdev_state == SDEV_CANCEL ||
+ sdev->sdev_state == SDEV_DEL)
err = SCSI_DH_NOSYS;
+ if (sdev->sdev_state == SDEV_OFFLINE)
+ err = SCSI_DH_DEV_OFFLINED;
spin_unlock_irqrestore(q->queue_lock, flags);
- if (err)
+ if (err) {
+ if (fn)
+ fn(data, err);
return err;
+ }
if (scsi_dh->activate)
err = scsi_dh->activate(sdev, fn, data);
More information about the dm-devel
mailing list