[dm-devel] Shell Scripts or Arbitrary Priority Callouts?

Fri Mar 27 18:23:58 UTC 2009

John A. Sullivan III wrote:
> 
> Ah, I forgot another issue in our environment which mitigates 
> again NFS.
> It is very heavily virtualized so there are not a lot of physical
> devices.  They do have multiple Ethernet interfaces, (e.g., there are
> ten GbE interfaces on the SAN).  However, we have a problem aggregating
> throughput.  Because there are only a handful of actual (but very large)
> systems, Ethernet bonding doesn't help us very much.  Anything that
> passes through the ProCurve switches is mapped by MAC address (as
> opposed to being able to hash upon socket).  Thus, all traffic flows
> through a single interface.

With virtualization it's not about the bandwidth, it's about the random
IOPS. All that random io really does mean that most throughput is
measured in 100s of KB not MB. Most storage will assemble sequential
reads in the read cache on shared storage.

Think of my suggestion of NFS for VM OS configuration and system
images, then inside those VMs use iSCSI for application specific
storage. If running iSCSI inside the VM is too problematic due to
virtual networking issues, you can always have the host system
run the initiator and present the SCSI devices to the VMs. It's
just a PITA when you re-allocate VMs from one host to another.

> Moreover, opensolaris only seems to support 802.3ad bonding.  Our
> ProCurve switches do not support 802.3ad bonding across multiple
> switches so using leaves use vulnerable to a single point of failure
> (ironically).  I believe Nortel supports this though I'm not sure about
> Cisco.  HP is releasing that capability this or next quarter but not in
> our lowly 2810 models.  We thought we got around the problems by using
> Linux based balance-xor using hash 3+4 and it seemed to work until we
> noticed than the switch CPU shot to 99% under even light traffic
> loads :(

Yes, the ProCurves and PowerConnects only use IEEE protocols and
IEEE doesn't have a protocol for splitting a LAG yet :-(

And yes the XOR load-balance will drive your switches to their knees
with gratuitous ARPs. I too found that out the hard way... :-(

> That's initially why we wanted to go the RAID0 route with each interface
> on a different network for aggregating throughput and dm-multipath
> underneath it for fault tolerance.  If we go with NFS, I believe we will
> lose the ability to aggregate bandwidth.  Thanks - John

With NFS you can use 802.3ad, which is, as the scary man in Old Country
for Old Man said, "Is the best offer your going to get", and with Solaris'
IPMP you can protect the NFS SPoF, if you couple that with application
specific iSCSI running over different LAG groups using multi-pathing
then you can probably make use of all that bandwidth.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.