[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: [dm-devel] dm-mpath-rdac.patch problem
- From: Brian De Wolf <bldewolf csupomona edu>
- To: device-mapper development <dm-devel redhat com>
- Cc: linux-scsi vger kernel org
- Subject: Re: [dm-devel] dm-mpath-rdac.patch problem
- Date: Fri, 13 Jul 2007 12:33:03 -0700
Andrew Vasquez wrote:
> On Thu, 12 Jul 2007, Mike Anderson wrote:
>> Copying this mail to linux-scsi and Ccing Andrew Vasquez to possibly
>> provide input on the Qlogic behavior.
>> Chandra Seetharaman <sekharan us ibm com> wrote:
>>> On Thu, 2007-07-12 at 18:35 -0700, Brian De Wolf wrote:
>>>> Hello All,
>>>> I'm not sure if this is the right place for this, but it seems to be the only
>>>> mailing list related to dm, multipath, and rdac, as far as I can tell. I've
>>>> been trying out the dm-mpath-rdac patch (both yesterday's and previous) with
>>>> gentoo's unstable 2.6.22 kernel, on a Sun x4100 through a QLA2422 HBA (firmware
>>>> ql2400_fw.bin.4.00.27) to an IBM DS4000. I am using a version of
>>>> multipath-tools that I got with git a few days ago.
>>>> I've got multipath working, it reports the hwhandler correctly ([hwhandler=1
>>>> rdac]), and the volume is mountable, etc. It also shows one link as active, the
>>>> other as ghost. However, once the active link dies, the volume becomes read
>>>> only, and both connections are listed as failed. Most importantly, something
>>>> like this shows up in the logs:
>>>> Jul 12 17:11:15 jimbo kernel: device-mapper: multipath rdac: queueing
>>>> MODE_SELECT command on 8:32
>>> It does look like the rdac hardware handler is doing the right thing and
>>> the qlogic is dying for some reason.
>>> I have tested this code in both RHEL5 and SLES10 environments (qla23xx)
>>> and they work fine. Can you try in one of those and see if it is any
>>> Just an FYI w.r.t multipath tools: please remove the patch
>>> tools/.git;a=commit;h=e1e1a1bfb2cf76bfd1a49335e3deec5360fb09db from your
>>> tree for the tools to calculate the path priorities properly.
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: ISP System Error - mbx1=0h
>>>> mbx2=8012h mbx3=8002h.
>>>> Jul 12 17:11:15 jimbo kernel: qla2xxx 0000:02:01.1: Firmware has been previously
>>>> dumped (ffffc2000171d000) -- ignoring request...
>>>> Jul 12 17:11:16 jimbo kernel: qla2xxx 0000:02:01.1: Performing ISP error
>>>> recovery - ha= ffff81007e85c530.
> Hmm yes, there's some real problems going on within the firmware which
> we need to triage. From the snippet above, the driver was able to
> capture a firmware-dump of a failure (not sure of the timing and how
> it relates to the window in which you recognized a 'problem'), but
> I'll need to to 'capture' the firmware trace and forward it along to
> us to inspect.
> 1) download the following shell script:
> 2) copy the script to the host (/tmp) which is experiencing the
> 3) reboot and load the driver with the ql2xextended_error_logging
> module parameter set to 1. e.g.:
> $ insmod qla2xxx.ko ql2xextended_error_logging=1
> 4) rerun your test and monitor the kernel-messages file for a message
> similar to:
> Firmware dump saved to temp buffer (1/adcdabcd)
> 5) To retrieve the dump, go to a console and type the following:
> # cd /tmp/
> # ./qla_dmp.sh 1
> The value passed to qla_dmp.sh should be the same as the first integer
> in the 'saved to temp buffer' string (in this example, 1). If the
> operation was successful, a message like to following should be
> Firmware dumped to file fw_dump_1_20041217_023222.txt.gz
> Formward the
> forward over the file.
> 6) forward over the /var/log/messages file of the driver load and
> failure snippet.
> Not sure which firmware version you are running, but an additional
> datapoint which may be useful after you send the firmware-dump is to
> download the latest 24xx firmware file from QLogic.com:
> and retry the test. If you still see problems, and see a similar
> 'Firmware dump saved...' messages. Follow the steps above again and
> forward the same datapoints.
I have tried both the ql2400_fw.bin.4.00.18 and ql2400_fw.bin.4.00.27 firmwares
and the HBA had the same error. The attached datapoints were done using
Note: This is a resend to the mailing list without attachments.
>>>> While this may be something for the maintainer of the qla2xxx module (I can't
>>>> figure out where I'd send it, in that case...) I think it may be of interest
>>>> that the dm_rdac module tries to push something over the HBA that causes it to
>>>> bail completely and start from scratch (it starts init processes and loading
>>>> firmware again).
>>>> Not to say that I'm not interested in any help getting this working, that is.
>>>> If you have any suggestions on how to get this working, I'd love to hear them.
>>>> I'm also willing to guinea pig some testing if you need it (This box still has a
>>>> bit before it will have to be put in use). I may use redhat to ensure that it's
>>>> not just a broken HBA, but for the long run we would like it to join our gentoo
>>>> Brian De Wolf
>>>> PS- If the subject mislead you because you feel that this is just a qla2xxx
>>>> problem, I'm sorry for wasting your time.
> Andrew Vasquez
> dm-devel mailing list
> dm-devel redhat com
[Date Prev][Date Next] [Thread Prev][Thread Next]