[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] Tuning suggestions for large systems with many (6000+) paths



Hello!

 

We have a 3-node database cluster running Oracle supporting a data warehouse application. To get the thruput we need, we have 4 dual-port HBAs, and the hosts are zoned to 8 or 12 array ports. This ends up giving us a huge # of paths for  DM to manage:

 

root dbxx [/nfshome/jwasilko ] multipathd paths count

Paths: 6000

Busy: False

 

We’re running OEL 6.3, and:

device-mapper-multipath-libs-0.4.9-64.0.1.el6.x86_64

device-mapper-multipath-0.4.9-64.0.1.el6.x86_64

 

We’ve seen issues like high CPU usage when paths return, and we’re struggling to understand why we see a 30 second I/O stall across all paths when one path is failed (by disabling the switch port in the FC switch).

 

Here’s our multipath.conf:

 

blacklist {        devnode "*"

}

blacklist_exceptions {

        devnode "sd*"

}

defaults {

       user_friendly_names yes

        find_multipaths yes

        fast_io_fail_tmo        1

        dev_loss_tmo            30

        checker_timeout         10

        failback                immediate

        rr_weight               uniform

        no_path_retry           fail

        max_fds                 8192

        path_checker            tur

        #rr_min_io              8

        rr_min_io_rq            8

        polling_interval        5

        path_grouping_policy    multibus

        path_selector           "round-robin 0"

}

devices {

   device {

        vendor                  "VIOLIN"

        product                 "SAN ARRAY"

        path_grouping_policy    group_by_serial

        getuid_callout          "/sbin/scsi_id --whitelisted --replace-whitespace --page=0x80 --device=/dev/%n"

        features                "1 queue_if_no_path"

        hardware_handler        "0"

        }

   device {

        vendor "3PARdata"

          product "VV"

          path_grouping_policy multibus

          path_checker tur

          no_path_retry 12

          features "0"

          hardware_handler "0"

          path_selector   "round-robin 0"

          #path_selector          "queue-length 0"

          rr_weight     uniform

          rr_min_io     100

          failback      immediate

        #  fast_io_fail_tmo 1

        #  dev_loss_tmo 30

        }

}

 

Thanks,

 

-jeff

 

A little more detail on the # of luns we have:

 

 

root dbxx [/nfshome/jwasilko ] multipathd show multipaths status

name                failback  queueing paths dm-st  write_prot

dbxx-data-prd-t2-78 immediate 12 chk   48    active rw       

dbxx-data-prd-t1-21 immediate off      32    active rw       

dbxx-data-prd-t1-22 immediate off      32    active rw       

dbxx-data-prd-t1-23 immediate off      32    active rw       

dbxx-data-prd-t1-18 immediate off      32    active rw       

dbxx-data-prd-t1-24 immediate off      32    active rw       

dbxx-data-prd-t1-19 immediate off      32    active rw       

dbxx-data-prd-t1-25 immediate off      32    active rw       

dbxx-data-prd-t1-20 immediate off      32    active rw       

dbxx-data-prd-t1-27 immediate off      32    active rw       

dbxx-data-prd-t1-29 immediate off      32    active rw       

dbxx-data-prd-t1-28 immediate off      32    active rw       

dbxx-data-prd-t1-30 immediate off      32    active rw       

dbxx-data-prd-t1-31 immediate off      32    active rw       

dbxx-reco2-1        immediate off      32    active rw       

dbxx-reco2-2        immediate off      32    active rw       

dbxx-reco2-3        immediate off      32    active rw       

dbxx-data-prd-t1-26 immediate off      32    active rw       

dbxx-data-prd-t1-17 immediate off      32    active rw       

dbxx-reco2-4        immediate off      32    active rw       

dbxx-vote-0         immediate 12 chk   48    active rw       

dbxx-vote-1         immediate 12 chk   48    active rw       

dbxx-vote-2         immediate 12 chk   48    active rw       

dbxx-acfs-0         immediate 12 chk   48    active rw       

dbxx-acfs-1         immediate 12 chk   48    active rw       

dbxx-acfs-2         immediate 12 chk   48    active rw       

dbxx-acfs-3         immediate 12 chk   48    active rw       

dbxx-acfs-4         immediate 12 chk   48    active rw       

dbxx-acfs-5         immediate 12 chk   48    active rw       

dbxx-acfs-6         immediate 12 chk   48    active rw       

dbxx-acfs-7         immediate 12 chk   48    active rw       

dbxx-arch-0         immediate 12 chk   48    active rw       

dbxx-arch-1         immediate 12 chk   48    active rw       

dbxx-arch-2         immediate 12 chk   48    active rw       

dbxx-arch-3         immediate 12 chk   48    active rw       

dbxx-arch-4         immediate 12 chk   48    active rw       

dbxx-arch-7         immediate 12 chk   48    active rw       

dbxx-arch-6         immediate 12 chk   48    active rw       

dbxx-data-prd-t2-0  immediate 12 chk   48    active rw       

dbxx-arch-5         immediate 12 chk   48    active rw       

dbxx-data-prd-t2-1  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-2  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-4  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-3  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-5  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-6  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-7  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-8  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-9  immediate 12 chk   48    active rw       

dbxx-data-prd-t2-10 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-11 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-12 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-13 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-14 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-15 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-16 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-17 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-19 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-18 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-20 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-22 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-21 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-23 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-24 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-25 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-26 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-27 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-28 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-29 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-31 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-32 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-30 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-33 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-34 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-35 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-36 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-39 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-37 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-38 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-40 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-41 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-42 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-43 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-44 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-45 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-46 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-47 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-48 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-49 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-50 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-51 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-52 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-53 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-54 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-55 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-56 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-57 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-58 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-59 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-60 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-61 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-62 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-63 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-64 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-65 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-66 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-67 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-68 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-69 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-70 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-71 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-72 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-73 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-74 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-75 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-76 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-77 immediate 12 chk   48    active rw       

dbxx-data-prd-t2-79 immediate 12 chk   48    active rw       

dbxx-data-prd-t1-10 immediate off      32    active rw       

dbxx-data-prd-t1-11 immediate off      32    active rw       

dbxx-data-prd-t1-12 immediate off      32    active rw       

dbxx-data-prd-t1-13 immediate off      32    active rw       

dbxx-data-prd-t1-14 immediate off      32    active rw       

dbxx-data-prd-t1-15 immediate off      32    active rw       

dbxx-data-prd-t1-16 immediate off      32    active rw       

dbxx-reco1-1        immediate off      32    active rw       

dbxx-reco1-2        immediate off      32    active rw       

dbxx-reco1-3        immediate off      32    active rw       

dbxx-data-prd-t1-03 immediate off      32    active rw       

dbxx-data-prd-t1-06 immediate off      32    active rw       

dbxx-data-prd-t1-04 immediate off      32    active rw       

dbxx-data-prd-t1-01 immediate off      32    active rw       

dbxx-data-prd-t1-02 immediate off      32    active rw       

dbxx-data-prd-t1-08 immediate off      32    active rw       

dbxx-reco1-4        immediate off      32    active rw       

dbxx-data-prd-t1-05 immediate off      32    active rw       

dbxx-data-prd-t1-07 immediate off      32    active rw       

dbxx-data-prd-t1-09 immediate off      32    active rw       

 

 

--

Jeff Wasilko

Technical Architect, EEMS Platform Ops

 

eBay Enterprise

781 372 4992   M 781 820 0882   F 781 863 8118  

jwasilko ebay com   ebayenterprise.com

 

logo-email.gif

 



The information contained in this electronic mail transmission is intended only for the use of the individual or entity named in this transmission. If you are not the intended recipient of this transmission, you are hereby notified that any disclosure, copying or distribution of the contents of this transmission is strictly prohibited and that you should delete the contents of this transmission from your system immediately. Any comments or statements contained in this transmission do not necessarily reflect the views or position of eBay Enterprise. or its subsidiaries and/or affiliates.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]