[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] Designing a new prio_callout

I don't think it's possible to get the IP of the iSCSI session from within multipath. If anyone knows a way, I could easily write a dumb version of a callout like you describe.

I'm not convinced, though, that I could do much better than prio_random with rr_min_io > 2 billion without some extensive work. You'd have to re-check path priorities for each call to the script, which would involve walking through all existing paths and priorities and deciding on path priorities in some intelligent way. It would get hairy pretty quickly.

Thanks for the tidbit on how round robin works. Great to know!

On 8/16/07, Stefan Bader <Stefan Bader de ibm com > wrote:
> Now what happens? If mytarget has multiple LUs associated with it, the
> multipath output will look like it did below if failover is being used --
> two paths for each of two devices. The problem for us is that by default,
> multipath just uses the first path that it sees. Which means that for every
> device in mytarget, all data will be read and written across just the first
> path --, in this case.
> We need a way to load balance connections across all available connections.
> There are several ways that I can see to do this. Ideally, we would
> implement ALUA on our end and advise people to use mpath_prio_alua as their
> callout. But this has a development cost. We could also implement a custom
> system as your suggest, but this also has a development cost.
> If we could advise users to manually set priorities on the client side, that
> would be acceptable, but this is impossible with the current version of
> multipath.

Can you find the IP address and UID of a device with the node name?
For example you get /dev/sdc and then look for UID (can be retrieved
with scsi_id) and the IP address of the connection (not sure this is
possible). Then manually create a file containing mappings:


Create a script that is used as the callout which takes a node name
looks into the file and prints out the priority. This way the priority
of a path does not change like it does with random priorities. The
other path will only be used on failure and switched back as soon as
the other one is back again (with failback immediate).

> On a related note, I've read the reports of people experiencing higher
> levels of performance with lower settings of rr_min_io, but it seems to me
> that as rr_min_io gets smaller, the system becomes less like active/passive
> MPIO and more like active/active MPIO, so users experiencing this
> performance improvement would be better off using group_by_serial, so that
> all paths are excitable simultaneously.

The setting of rr_min_io only matters if you have more than one path
per path group. Otherwise  you only can use one path at a time and
there is no round-robin. If you have more than one path in a group
then lower values help since paths are more likely to be used
concurrently. The default of 1000 is to high. Kiyoshi Ueda and
Jun'ichi Nomura have done some measurements while looking for a way to
improve performance more generally
(https://ols2006.108.redhat.com/2007/Reprints/ueda-Reprint.pdf). But
again, rr_min_io is only relevant to load-balance paths within the
same path group (multibus or as you mentioned group_by_serial). The
reason for you path changes (except for real failures) might be rather
that random_prio results in different priorities whenever any priority
value is checked again.


dm-devel mailing list
dm-devel redhat com

Ethan John
(206) 841.4157
[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]