[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] multipath timing help sought



Timing help sought.. I think

We have been running on an iscsi mpath setup for about 1.5 years ... (no real failover other than testing)
Here is the HW we are dealing with:
- Equallogic ps disk array dual controller modules
- qlogic 4052 HBA
- RHEL4.5

During testing phase things worked .. if i pulled power to a switch things moved over to the other .. but when
other switch came back .. no 'failback' occurred.. I was not too concerned about this as the initial failure worked and
oracle kept going etc.. (if this happened in real life i figured we would obviously replace switch and reboot boxes when things
were back) ..

The orig  switch setup did not incorporate a trunk as expected by the equallogic (as we now know) .. This  was our error and reason for the fail back
to not happen (im thinking). By now everything is in production and we discover this on a routine (during scheduled maintenance) fw
update of the switches which requires a reboot.

One week later (during maintenance again) we have the trunk in place between our iscsi switches and spanning-tree working on the
switches (iscsi san looks like a square with two sets of switches with 1g fiber connections on one set of parallel lines)

My issue is this .. I am now seeing  many  path 'failures' like below .. but these are not really failures.. as it comes back
in less than 2 seconds.. It seems no real I/O is affected at all.

Is this due to a setting in my defaults section of multipath.conf? I'm thinking minimum io or polling interval. Links all show
good on the switches and minimal errors (if any).


====== snip from /var/log/messages ===========
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 161085656
Sep 27 09:25:45 host kernel: device-mapper: dm-multipath: Failing path 8:64.
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 161085664
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 119577336
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 119577344
Sep 27 09:25:45 host kernel: SCSI error : <2 0 3 0> return code = 0x20000
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 233247600
Sep 27 09:25:45 host kernel: end_request: I/O error, dev sde, sector 233247608
Sep 27 09:25:45 host multipathd: 8:64: mark as failed
Sep 27 09:25:45 host multipathd: host.datafiles.prod: remaining active paths: 1
Sep 27 09:25:47 host multipathd: 8:64: readsector0 checker reports path is up
Sep 27 09:25:47 host multipathd: 8:64: reinstated
Sep 27 09:25:47 host multipathd: host.datafiles.prod: remaining active paths: 2
Sep 27 09:25:47 host multipathd: host.datafiles.prod: switch to path group #1
Sep 27 09:25:47 host multipathd: host.datafiles.prod: switch to path group #1
========= end snip =========================

========= /etc/multipath.conf ================
defaults {
        multipath_tool          "/sbin/multipath -v0"
        udev_dir                /dev
        polling_interval        2
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        path_checker            readsector0
        prio_callout            "/bin/true"
        features                "0"
        rr_min_io               2
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}
## everything is friendly names and ignore devices below
=========== end ======================


--
:wq!
kevin.foote
[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]