[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] multipath prio_callout broke from 5.2 to 5.3



Thanks John. I think that you and I are doing very similar things, but there is one thing in your technique that would cause me problems. I start multipathd at system boot time, but my iscsi devices get connected (and disconnected) later as the system runs, so the list you generate when you start multipathd (to map /dev/sdX names to their /dev/disk/by-path counterpart) is not available when multipathd starts for me. However, it seems we are indeed facing the same issue: We want to be able to specify path priorities based on some criteria in the /dev/disk/by-path name. I usually get this from '/sbin/udevadm info --query=env --name=/dev/sdX', and in fact I usually only care about the ID_PATH variable out of that. Would you also be able to get the information you need out of this type of output? (If the 'env' query is not enough, maybe 'all' would be better)

Ben mentioned that if this was something that was a common need that maybe a shared object could be added upstream to make this a general solution. I'm thinking that a module could be written that would do this type of query on the device, and then look up the priority in a simple expression file that might look something like:

<regular expression><priority>

In my case I could just look for something like /ID_PATH=ip-10.0.x/ to see if it is on the particular network in question, and then set the priority. You might search for entire iqn names. But this would be flexible to allow anyone to set priority based on the udev parameters of vendor, model, serial numbers, iqn path, etc.

I don't know if it is feasible to query udev in this environment -- perhaps someone closer to the internals could answer that. (It looks like it could also be pulled from /sys, but I'm not too familiar with that structure, and we would need to make sure it was not too dependent on kernel changes which might change that structure).

Thoughts?

-Ty!

John A. Sullivan III wrote:
On Thu, 2009-04-23 at 12:08 -0600, Ty! Boyack wrote:
This thread has been great information since I'm looking at the same type of thing. However it raises a couple of (slightly off-topic) questions for me. My recent upgrade to fedora 10 broke my prio_callout bash script just like you described, but my getuid_callout (a bash script that calls udevadm, grep, sed, and iscsi_id) runs just fine. Are the two callouts handled differently?

Also, is there an easy way to know what tools are in the private namespace already? My prio_callout script calls two other binaries: /sbin/udevadm and grep. If I go to C-code, handling grep's functions myself is no problem, but I'm not confident about re-implementing what udevadm does. Can I assume that since /sbin/udevadm is in /sbin that it will be available to call via exec()? Or would I be right back where we are with the bash scripting, as in having to include a dummy device as you described?

Finally, in my case I've got two redundant iscsi networks, one is 1GbE, and the other is 10GbE. In the past I've always had symetric paths, so I've used round-robin/multibus. But I want to focus traffic on the 10GbE path, so I was looking at using the prio callout. Is this even necessary? Or will round-robin/multibus take full advantage of both paths? I can see round-robin on that setup resulting in either around 11Gbps or 2 Gbps, depending on whether the slower link becomes a limiting factor. I'm just wondering if I am making things unnecessarily complex by trying to set priorities.

Thanks for all the help.

-Ty!

I can't answer the questions regarding the internals.  I did make sure
my bash scripts called not external applications and I placed everything
in /sbin.

I did find I was able to pick and choose which connections had which
priorities - that was the whole purpose of my script.  In my case, there
were many networks and I wanted prioritized failover to try to balance
the load across interfaces and keep failover traffic on the same switch
rather than crossing a bonded link to another switch.  I did it by cross
referencing the mappings in /dev/disk/by-path with a prioritized list of
mappings.  I believe I posted the entire setup in an earlier e-mail.  If
you'd like, I can post the details again.

As I reread your post a little more closely, I wonder if using multibus
as you describe will not slow you down to the lowest common denominator.
I know when I tested with RAID0 across several interfaces to load
balance traffic (this seemed to give better average performance across a
wide range of I/O patterns than multi-bus with varying rr_min_io
settings), I had three e1000e NICs and one on board NIC. When I replaced
the on-board with another e1000e, I saw a substantial performance
improvement.  I don't know if that will be your experience for sure but
pass it along as a caveat. Hope this helps - John


--
-===========================-
 Ty! Boyack
 NREL Unix Network Manager
 ty nrel colostate edu
 (970) 491-1186
-===========================-


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]