[dm-devel] [PATCH 1/1]: missing call to pg_init_done causes I/O to be hung forever

Menny_Hamburger at Dell.com Menny_Hamburger at Dell.com
Tue Dec 14 07:31:31 UTC 2010


Hi,

I tried that as one of my options when I worked on this issue - it works, however it seemed to general to me back then since it required testing additional areas such as other H/W handlers and perhaps other md modules. I do not have the required resources for testing this, however I would gladly send the other version of the patch.

Best Regards,
Menny



From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com] On Behalf Of Moger, Babu
Sent: 13 December, 2010 20:03
To: device-mapper development
Subject: Re: [dm-devel] [PATCH 1/1]: missing call to pg_init_done causes I/O to be hung forever

Menny,
   Yes, I agree there is a problem. Wouldn't it be simpler if you could handle everything scsi_dh.c..  See my response below..
Thanks
Babu

________________________________
From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com] On Behalf Of Menny_Hamburger at Dell.com
Sent: Monday, December 13, 2010 9:35 AM
To: dm-devel at redhat.com
Subject: [dm-devel] [PATCH 1/1]: missing call to pg_init_done causes I/O to be hung forever

When scsi_dh_activate returns SCSI_DH_NOSYS the H/W handler callback is not called, pg_init_done is not called in
the multipath layer and pending I/O is requeued forever; this situation causes all userland processes currently performing I/O
on the device to I/O hang. A similar situation occurs when the device has transitioned to SDEV_CANCEL/SDEV_DEL and the device
handler data had not yet been deleted.

The easiest way to reproduce this is in an ISCSI environment:
  dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000 &
  /etc/init.d/iscsi stop
In this example, dd will I/O hang forever and the only way to release it will be to reboot the machine

This patch calls pg_init_done directly from the mpath code when the scsi_dh_activate returns a non SCSI_DH_OK error.

Note:
The patch is over RHEL5.5.
When running an upstream kernel, the above scenario may not occur because the request queue is aborted in dm-mpath.c:fail_path.
This patch makes sure the problem does not occur at all, rather than handling it when it does. In addition, it seems too risky to apply
request queue abort functionality on RHEL5 at this stage.

diff -r -U 2 a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
--- a/drivers/md/dm-mpath.c   2010-12-13 09:16:31.358858000 +0200
+++ b/drivers/md/dm-mpath.c   2010-12-13 09:16:31.796998000 +0200
@@ -1190,4 +1190,5 @@
      case SCSI_DH_OK:
            break;
+     case SCSI_DH_DEV_OFFLINED:

If you are not doing anything special then I would let default take care of it.  No need of this change..

      case SCSI_DH_NOSYS:
            if (!m->hw_handler_name) {
@@ -1252,7 +1253,15 @@
 {
      struct pgpath *pgpath = (struct pgpath *) data;
+     int err;

-     scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
+     err = scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
                        pg_init_done, &pgpath->path);
+
+     /*
+     * If error is not SCSI_DH_OK, we have not entered the scsi_dh H/W handler and did not call pg_init_done -
+     * need to call pg_init_done directly.
+     */
+     if (err)
+           pg_init_done(&pgpath->path, err);
 }
You can move this to scsi_dh.c


diff -r -U 2 a/drivers/scsi/device_handler/scsi_dh.c b/drivers/scsi/device_handler/scsi_dh.c
--- a/drivers/scsi/device_handler/scsi_dh.c     2010-12-13 09:16:31.616554000 +0200
+++ b/drivers/scsi/device_handler/scsi_dh.c     2010-12-13 09:16:31.878170000 +0200
@@ -443,4 +443,9 @@
      spin_unlock_irqrestore(q->queue_lock, flags);

+     if (sdev->sdev_state == SDEV_CANCEL ||
+         sdev->sdev_state == SDEV_DEL ||
+         sdev->sdev_state == SDEV_OFFLINE)
+           err = SCSI_DH_DEV_OFFLINED;
+
You can change it something like below..
      if (err) {
                         if(fn)
fn(data, err);
            return err;
          }


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20101214/8ea04d1a/attachment.htm>


More information about the dm-devel mailing list