[dm-devel] Designing a new prio_callout

Ethan John ethan.john at gmail.com
Wed Aug 15 15:57:09 UTC 2007


Definitely, that would be ideal -- having code on our end that tracked who
was writing to which path. It's a matter of development effort at this
point, and so I'm exploring other options.

You can imagine our system as a number of different machines -- each with
separate network addresses -- that all provide access to the same LUs. Let's
say you have a single target on our system called "mytarget." Users could
log into that target via any one of a number of network addresses (even via
DNS name, I suppose).

So the response from SendTargets is along the lines of:
10.53.152.22:3260,1 iqn.2001-07.com.company:qaiscsi2:mytarget
10.53.152.23:3260,2 iqn.2001-07.com.company:qaiscsi2:mytarget

So they might initiate two logins to two separate IPs:
iscsiadm -m node --portal 10.53.152.22 --target
iqn.2001-07.com.company:qaiscsi2:mytarget --login;
iscsiadm -m node --portal 10.53.152.23 --target
iqn.2001-07.com.company:qaiscsi2:mytarget --login;

Now what happens? If mytarget has multiple LUs associated with it, the
multipath output will look like it did below if failover is being used --
two paths for each of two devices. The problem for us is that by default,
multipath just uses the first path that it sees. Which means that for every
device in mytarget, all data will be read and written across just the first
path -- 10.53.151.22, in this case.

We need a way to load balance connections across all available connections.

There are several ways that I can see to do this. Ideally, we would
implement ALUA on our end and advise people to use mpath_prio_alua as their
callout. But this has a development cost. We could also implement a custom
system as your suggest, but this also has a development cost.

If we could advise users to manually set priorities on the client side, that
would be acceptable, but this is impossible with the current version of
multipath.

As such, the best we can do is to set path priorities randomly, using
mpath_prio_random. This is fine, but there is a significant cost in terms of
resource usage on our system when the active path changes frequently,
especially in cases where users have thousands of clients connected to our
system, and paths are switching constantly. Thus we need to limit the number
of times the active path switches, which the rr_min_io settings seems to do
quite nicely.

Not sure if that makes any more sense? I'm trying to be thorough for the
sake of the next guy. The information on the web is pretty minimal about all
this, and it's been a painful experience getting up to speed.


On a related note, I've read the reports of people experiencing higher
levels of performance with lower settings of rr_min_io, but it seems to me
that as rr_min_io gets smaller, the system becomes less like active/passive
MPIO and more like active/active MPIO, so users experiencing this
performance improvement would be better off using group_by_serial, so that
all paths are excitable simultaneously.

On 8/15/07, Stefan Bader <Stefan.Bader at de.ibm.com> wrote:
>
> Hi Ethan,
>
> I might not understand the problem completely but I do not understand
> the benefit of changing rr_min_io. As far as I can see from your
> multipath output, both of the devices consist of two path groups with
> one path. This means, as long as there is no path failure I/O will
> never be sent to the inactive group.
> I guess the only thing you need is a script that might find out from a
> given scsi device (like sdc) whether this would be the preferred path
> and then print a number that represents the priority (the lower, the
> higher). Then use this as priority callout and group by priority with
> failback set to immediate.
>
> Regards,
> Stefan
>
> 2007/8/14, Ethan John <ethan.john at gmail.com>:
> > For the record, setting rr_min_io to something extremely large (we're
> using
> > 2 billion now, since I'm assuming it's a C integer) solves the immediate
> > problem that we're having (overhead in path switching causing poor
>
> > > mpath45 (20002c9020020001a00151b6b46bb57b0) dm-1
> > company,iSCSI target
> > > [size=15G][features=0][hwhandler=0]
> > > \_ round-robin 0 [prio=1][active]
> > >  \_ 22:0:0:1 sdc 8:32  [active][ready]
> > > \_ round-robin 0 [prio=1][enabled]
> > >  \_ 23:0:0:1 sde 8:64  [active][ready]
> > > mpath44 (20002c9020020001200151b6b46bb57ae) dm-0
> > company,iSCSI target
> > > [size=15G][features=0][hwhandler=0]
> > > \_ round-robin 0 [prio=1][enabled]
> > >  \_ 22:0:0:0 sdb 8:16  [active][ready]
> > > \_ round-robin 0 [prio=1][enabled]
> > >  \_ 23:0:0:0 sdd 8:48  [active][ready]
> > >
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>



-- 
Ethan John
http://www.flickr.com/photos/thaen/
(206) 841.4157
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20070815/35fc6a71/attachment.htm>


More information about the dm-devel mailing list