[libvirt-users] Doc: How to use NPIV in libvirt

Osier Yang jyang at redhat.com
Thu Sep 12 12:57:50 UTC 2013


Before posting it to WIKI or somewhere, I want to see if there is any
suggestions on it, or if I missed something.


============================================

                   How to use NPIV in libvirt

   I planned to wrote a document about how to use NPIV in libvirt after
more features are supported, but it looks like I can't wait till then,
got lots lots of questions from both the bugs and mails. So here we go.

   The document tries to summary up the things about NPIV that libvirt
supports till now, and the TODO list. Feedback or suggestion is welcomed.

1) How to find out which HBA(s) support vHBA

   For libvirt newer than "1.0.4", you can find it out simply by:

     # virsh nodedev-list --cap vports

   "--cap vports" is to tell "nodedev-list" only outputs the devices
which support "vports" capability, i.e. support vHBA.

   And also since version "1.0.4", you should be able to know the maximum
vports the HBA supports and the current vports number from the HBA's XML,
e.g.

     # virsh nodedev-dumpxml scsi_host5
     <device>
       <name>scsi_host5</name>
       <parent>pci_0000_04_00_1</parent>
       <capability type='scsi_host'>
         <host>5</host>
         <capability type='fc_host'>
           <wwnn>2001001b32a9da4e</wwnn>
           <wwpn>2101001b32a9da4e</wwpn>
           <fabric_wwn>2001000dec9877c1</fabric_wwn>
         </capability>
         <capability type='vport_ops'>
           <max_vports>164</max_vports>
           <vports>5</vports>
         </capability>
       </capability>
     </device>

   For libvirt older than "1.0.4", it's a bit complicated than above:

   First you need to find out all the HBAs, e.g.

     # virsh nodedev-list --cap scsi_host
     scsi_host0
     scsi_host1
     scsi_host2
     scsi_host3
     scsi_host4
     scsi_host5

   And then, to see if the HBA supports vHBA, check if the dumped
XML contains "vport_ops" capability. E.g.

     # virsh nodedev-dumpxml scsi_host3
     <device>
       <name>scsi_host3</name>
       <parent>pci_0000_00_08_0</parent>
       <capability type='scsi_host'>
         <host>3</host>
       </capability>
     </device>

   That says "scsi_host3" doesn't support vHBA

     # virsh nodedev-dumpxml scsi_host5
     <device>
       <name>scsi_host5</name>
       <parent>pci_0000_04_00_1</parent>
       <capability type='scsi_host'>
         <host>5</host>
         <capability type='fc_host'>
           <wwnn>2001001b32a9da4e</wwnn>
           <wwpn>2101001b32a9da4e</wwpn>
           <fabric_wwn>2001000dec9877c1</fabric_wwn>
         </capability>
         <capability type='vport_ops' />
       </capability>
     </device>

   But "scsi_host5" supports it.

   One might be confused with the node device naming style (e.g. scsi_host5)
in this document and RHEL6 Virtualization Guide [1]
(pci_10df_fe00_scsi_host_0). It's because of libvirt has two backends for
node device driver: udev and HAL. We prefer the udev backend more than HAL
backend in internal implementation, I think there is good enough reason to
do so (HAL is maintenance mode now). I believe udev backend is used more
than HAL backend, but if your destribution packager build libvirt without
udev backend, don't be surprised with the node device names like the ones
in [1].

2) How to create a vHBA

   Pick up one HBA which supports vHBA, use it's "node device name" as the
"parent" of vHBA, and specify the "wwnn" and "wwpn" in the vHBA's XML.  E.g.

     <device>
       <name>scsi_host6</name>
       <parent>scsi_host5</parent>
       <capability type='scsi_host'>
         <capability type='fc_host'>
           <wwnn>2001001b32a9da5e</wwnn>
           <wwpn>2101001b32a9da5e</wwpn>
         </capability>
       </capability>
     </device>

   Then create the vHBA with virsh command "nodedev-create" (assuming above
XML file is named "vhba.xml"):

     # virsh nodedev-create vhba.xml
     Node device scsi_host6 created from vhba.xml

   Since "0.9.10", libvirt will generate "wwnn" and "wwpn" automatically if
they are not specified. It means one can create the vHBA by a more simple
XML like:

     <device>
       <parent>scsi_host5</parent>
       <capability type='scsi_host'>
         <capability type='fc_host'>
         </capability>
       </capability>
     </device>

3) How to destroy a vHBA

   As usual, destroying something is always simpler than creating it:

     # virsh nodedev-destroy scsi_host6
     Destroyed node device 'scsi_host6'

   You might already realize that the vHBA is removed permanently, don't be
surprised, it's the life, node device driver doesn't support persistent
config. I won't say it's nightmare for users who screams when realizing the
vHBA disappeared after a system rebooting, but it's relatively not good,
(assuming that you got the wwnn:wwpn pair from the storage admin, but didn't
record it). Fortunately, we support the persistent vHBA now, see next 
section
for details.

4) How to create a persistent vHBA

   Let's go back to the history a bit firstly.

   Prior to libvirt "1.0.5", one can define a "scsi" type pool based on a
(v)HBA by it's scsi host name (e.g.  "host5" in XML below). E.g.

     <pool type='scsi'>
       <name>poolhba0</name>
       <uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
       <capacity unit='bytes'>0</capacity>
       <allocation unit='bytes'>0</allocation>
       <available unit='bytes'>0</available>
       <source>
         <adapter name='host0'/>
       </source>
       <target>
         <path>/dev/disk/by-path</path>
         <permissions>
           <mode>0700</mode>
           <owner>0</owner>
           <group>0</group>
         </permissions>
       </target>
     </pool>

   Quite nice? yeah, at least it looks so, but the problem is the scsi host
number is *unstable* (it can be changed after system rebooting, or kernel
module reloading, or a vHBA recreating etc), and thus the "scsi" type pool
based on a (v)HBA becomes unstable too. Obviously it doesn't help on the
"persistent vHBA" problem.

   To solve the problems, since libvirt "1.0.5", we introduced new XML 
schema
to indicate the (v)HBA.  An example of the XML:

     <pool type='scsi'>
       <name>poolvhba0</name>
       <uuid>e9392370-2917-565e-692b-d057f46512d6</uuid>
       <source>
         <adapter type='fc_host' parent='scsi_host5' 
wwnn='20000000c9831b4b' wwpn='10000000c9831b4b'/>
       </source>
       <target>
         <path>/dev/disk/by-path</path>
         <permissions>
           <mode>0700</mode>
           <owner>0</owner>
           <group>0</group>
         </permissions>
       </target>
     </pool>

   It allows to define a "scsi" type pool based on either a HBA or a 
vHBA. For
HBA, "parent" attribute can be omitted. For vHBA, if "parent" is not 
specified,
libvirt will pick up the first HBA which supports vHBA, and doesn't 
exceed the
maximum vports it supports, automatically.

   For the pool based on a vHBA, When the pool is starting, libvirt will 
check
if the specified vHBA (wwnn:wwpn) is existing on host or not, if it doesn't
exist yet, libvirt will create it automatically. When the pool is being 
stopped,
the vHBA is destroyed. But since storage driver supports the persistent 
config,
one can easily gets the vHBA with same "wwnn:wwpn" in next starting 
(Don't scream
if your pool is transient).

   It's not the end if you want to get the vHBA created automatically 
after system
rebooting, you will need to set the pool as "autostart":

     # virsh pool-autostart poolvhba0

   One might be curious about why not to support persistent config for 
node device
driver, and support to create persistent vHBA there. One of the reason 
is that
it will be duplicate with what storage pool does. And another reason 
(the important
one) is we want to assiciate the libvirt storage pool/volume with domain 
(see
section "Use LUN for guest" below).


5) How to find out the LUN's path

   If you have defined the "scsi" type pool based on the (v)HBA, it's 
simple to
lookup what LUNs attached to the (v)HBA by virsh command "vol-list", e.g.

     # virsh vol-list poolvhba0 --details
     Name Path Type    Capacity  Allocation
--------------------------------------------------------------------------------------------------------
     unit:0:2:0 
/dev/disk/by-path/pci-0000:04:00.1-fc-0x203500a0b85ad1d7-lun-0 block  
20.01 GiB   20.01 GiB

   If you have not defined a "scsi" type pool based on the (v)HBA, you 
can find it
out (v)HBA by either virsh command "nodedev-list --tree", or iterating 
sysfs manually.

   To find out the LUNs by virsh command "nodedev-list" (irrelevant 
ouputs are
omitted):

     # virsh nodedev-list --tree
     +- pci_0000_00_0d_0
     |   |
     |   +- pci_0000_04_00_0
     |   |   |
     |   |   +- scsi_host4
     |   |
     |   +- pci_0000_04_00_1
     |       |
     |       +- scsi_host5
     |           |
     |           +- scsi_host7
     |           +- scsi_target5_0_0
     |           |   |
     |           |   +- scsi_5_0_0_0
     |           |
     |           +- scsi_target5_0_1
     |           |   |
     |           |   +- scsi_5_0_1_0
     |           |
     |           +- scsi_target5_0_2
     |           |   |
     |           |   +- scsi_5_0_2_0
     |           |       |
     |           |       +- block_sdb_3600a0b80005adb0b0000ab2d4cae9254
     |           |
     |           +- scsi_target5_0_3
     |               |
     |               +- scsi_5_0_3_0

   "scsi_host5" is an HBA on my host, it has a LUN named
"block_sdb_3600a0b80005adb0b0000ab2d4cae9254", don't be confused with 
the naming,
it's the naming style libvirt uses, meaningful only for libvirt. It 
indicates
the LUN has a short device path "/dev/sdb", and a ID
"3600a0b80005adb0b0000ab2d4cae9254":

     # ls /dev/disk/by-id/ | grep 3600a0b80005adb0b0000ab2d4cae9254
     scsi-3600a0b80005adb0b0000ab2d4cae9254

   To manually find the LUNs of a (v)HBA:

   First, you need to iterate over all the directores begins with the SCSI
scsi host number of the v(HBA) under "/sys/bus/scsi/devices". E.g. I 
will look
up the LUNs of the HBA with SCSI host number 5 on my host:

     # ls /sys/bus/scsi/devices/5:* -d
     /sys/bus/scsi/devices/5:0:0:0  /sys/bus/scsi/devices/5:0:1:0
     /sys/bus/scsi/devices/5:0:2:0  /sys/bus/scsi/devices/5:0:3:0

     # ls /sys/bus/scsi/devices/5\:0\:3\:0/block/sdc

   It means scsi_host5 has a LUN attached with device name "sdc" on address
"5:0:3:0".

     # ls /sys/bus/scsi/devices/5\:0\:1\:0/ | grep block
     device_blocked

   scsi_host5 doesn't have a LUN attached on address "5:0:2:0"

   The device name like "sdc" is not stable, to find out the stable 
path, find
out the symbol link which points to the device name. E.g.

     # ls -l /dev/disk/by-path/
     lrwxrwxrwx. 1 root root  9 Sep 10 22:28 
pci-0000:00:07.0-scsi-0:0:0:0 -> ../../sda
     lrwxrwxrwx. 1 root root 10 Sep 10 22:28 
pci-0000:00:07.0-scsi-0:0:0:0-part1 -> ../../sda1
     lrwxrwxrwx. 1 root root  9 Sep 10 22:28 
pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0 -> ../../sdc

   Then "/dev/disk/by-path/pci-0000:04:00.1-fc-0x203400a0b85ad1d7-lun-0" 
is the
stable path of the LUN attached to address "5:0:3:0". Of course, you can use
the similiar method to get the "by-id | by-uuid | by-label" stable path.

6) Use the LUN to guest

   Since libvirt "1.0.5", we supported to use the storage volume as disk 
source by
two new attributes ("pool" and "volume") for disk "<source"> element. E.g.

     <disk type='volume' device='disk'>
       <driver name='qemu' type='raw'/>
       <source pool='poolvhba0' volume='unit:0:2:0 '/>
       <target dev='hda' bus='ide'/>
     </disk>

   There are lots of advantage to do so. Since the mainly purpose of the
document is about "how to use", I will only mention two here to persuade
you using the it. First, you don't need to look up the LUN's path youself.
Second, assuming that you want to migrate a domain which uses a LUN attached
to a vHBA, do you want to create the vHBA manually on target host? With the
pool, you can simply define/start a pool with same config on target host.

   So, if your libvirt is newer than "1.0.5", we recommend you to define the
"scsi" type pool based on the (v)HBA, and use "pool/volume" names to use
the LUN as disk source.

   You can either use the LUN as qemu emulated disk, or passthrough it to
guest.

   To use it as qemu emulated disk, specifying the "device" attribute as
"device='disk|cdrom|floppy'". E.g.

     <disk type='volume' device='disk'>
       <driver name='qemu' type='raw'/>
       <source pool='blk-pool0' volume='blk-pool0-vol0'/>
       <target dev='hda' bus='ide'/>
     </disk>

   Or (using the LUN's path directly)

     <disk type='volume' device='disk'>
       <driver name='qemu' type='raw'/>
       <source 
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
       <target dev='sda' bus='scsi'/>
     </disk>

   To passthrough the LUN, specifying the "device" attribute as
"device='lun'", e.g.

     <disk type='volume' device='lun'>
       <driver name='qemu' type='raw'/>
       <source 
dev='/dev/disk/by-path/pci-0000\:04\:00.1-fc-0x203400a0b85ad1d7-lun-0'/>
       <target dev='sda' bus='scsi'/>
     </disk>

6) Future work

   * NPIV based SCSI host passthrough
     That's what the users ask: How to passthrough a (v)HBA to guest?
   * Expose vendor information, LUN's path, state of (v)HBA in its XML
   * May be a virsh command to simplify vHBA creation with options

[1] 
http://www.linuxtopia.org/online_books/rhel6/rhel_6_virtualization/rhel_6_virtualization_chap-Para-virtualized_Windows_Drivers_Guide-N_Port_ID_Virtualization_NPIV.html

Regards,
Osier




More information about the libvirt-users mailing list