[libvirt] [PATCH v3 10/30] schemas: Introduce disk type NVMe

Michal Privoznik mprivozn at redhat.com
Tue Dec 10 16:10:10 UTC 2019


On 12/9/19 11:55 PM, Cole Robinson wrote:
> On 12/2/19 9:26 AM, Michal Privoznik wrote:
>> There is this class of PCI devices that act like disks: NVMe.
>> Therefore, they are both PCI devices and disks. While we already
>> have <hostdev/> (and can assign a NVMe device to a domain
>> successfully) we don't have disk representation. There are three
>> problems with PCI assignment in case of a NVMe device:
>>
>> 1) domains with <hostdev/> can't be migrated
>>
>> 2) NVMe device is assigned whole, there's no way to assign only a
>>     namespace
>>
>> 3) Because hypervisors see <hostdev/> they don't put block layer
>>     on top of it - users don't get all the fancy features like
>>     snapshots
>>
>> NVMe namespaces are way of splitting one continuous NVDIMM memory
>> into smaller ones, effectively creating smaller NVMe-s (which can
>> then be partitioned, LVMed, etc.)
>>
>> Because of all of this the following XML was chosen to model a
>> NVMe device:
>>
>>    <disk type='nvme' device='disk'>
>>      <driver name='qemu' type='raw'/>
>>      <source type='pci' managed='yes' namespace='1'>
>>        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>>      </source>
>>      <target dev='vda' bus='virtio'/>
>>    </disk>
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>   docs/formatdomain.html.in            | 57 +++++++++++++++++++++++--
>>   docs/schemas/domaincommon.rng        | 32 ++++++++++++++
>>   tests/qemuxml2argvdata/disk-nvme.xml | 63 ++++++++++++++++++++++++++++
>>   3 files changed, 149 insertions(+), 3 deletions(-)
>>   create mode 100644 tests/qemuxml2argvdata/disk-nvme.xml
>>
>> diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
>> index 6df4a8b26e..fe871d933f 100644
>> --- a/docs/formatdomain.html.in
>> +++ b/docs/formatdomain.html.in
>> @@ -2944,6 +2944,13 @@
>>       </backingStore>
>>       <target dev='vdd' bus='virtio'/>
>>     </disk>
>> +  <disk type='nvme' device='disk'>
>> +    <driver name='qemu' type='raw'/>
>> +    <source type='pci' managed='yes' namespace='1'>
>> +      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>> +    </source>
>> +    <target dev='vde' bus='virtio'/>
>> +  </disk>
>>   </devices>
>>   ...</pre>
>>   
>> @@ -2957,7 +2964,8 @@
>>               Valid values are "file", "block",
>>               "dir" (<span class="since">since 0.7.5</span>),
>>               "network" (<span class="since">since 0.8.7</span>), or
>> -            "volume" (<span class="since">since 1.0.5</span>)
>> +            "volume" (<span class="since">since 1.0.5</span>), or
>> +            "nvme" (<span class="since">since 5.6.0</span>)
> 
> 6.0.0 or whatever version this will land in
> 
>>               and refer to the underlying source for the disk.
>>               <span class="since">Since 0.0.3</span>
>>               </dd>
>> @@ -3140,6 +3148,43 @@
>>                 <span class="since">Since 1.0.5</span>
>>                 </p>
>>                 </dd>
>> +            <dt><code>nvme</code></dt>
>> +              <dd>
>> +              To specify disk source for NVMe disk the <code>source</code>
>> +              element has the following attributes:
>> +              <dl>
>> +                <dt><code>type</code></dt>
>> +                <dd>The type of address specified in <code>address</code>
>> +                sub-element. Currently, only <code>pci</code> value is
>> +                accepted.
>> +                </dd>
>> +
>> +                <dt><code>managed</code></dt>
>> +                <dd>This attribute instructs libvirt to detach NVMe
>> +                controller automatically on domain startup (<code>yes</code>)
>> +                or expect the controller to be detached by system
>> +                administrator (<code>no</code>).
>> +                </dd>
>> +
>> +                <dt><code>namespace</code></dt>
>> +                <dd>The namespace ID which should be assigned to the domain.
>> +                According to NVMe standard, namespace numbers start from 1,
>> +                including.
>> +                </dd>
>> +              </dl>
>> +
>> +              The difference between <code><disk type='nvme'></code>
>> +              and <code><hostdev/></code> is that the latter is plain
>> +              host device assignment with all its limitations (e.g. no live
>> +              migration), while the former makes hypervisor to run the NVMe
>> +              disk through hypervisor's block layer thus enabling all
>> +              features provided by the layer (e.g. snapshots, domain
>> +              migration, etc.). Moreover, since the NVMe disk is unbinded
>> +              from its PCI driver, the host kernel storage stack is not
>> +              involved (compared to passing say <code>/dev/nvme0n1</code> via
>> +              <code><disk type='block'></code> and therefore lower
>> +              latencies can be achieved.
>> +              </dd>
>>             </dl>
>>           With "file", "block", and "volume", one or more optional
>>           sub-elements <code>seclabel</code>, <a href="#seclabel">described
>> @@ -3302,11 +3347,17 @@
>>               initiator IQN needed to access the source via mandatory
>>               attribute <code>name</code>.
>>             </dd>
>> +          <dt><code>address</code></dt>
>> +          <dd>For disk of type <code>nvme</code> this element
>> +            specifies the PCI address of the host NVMe
>> +            controller.
>> +            <span class="since">Since 5.6.0</span>
> 
> Same
> 
>> +          </dd>
>>           </dl>
>>   
>>           <p>
>> -        For a "file" or "volume" disk type which represents a cdrom or floppy
>> -        (the <code>device</code> attribute), it is possible to define
>> +        For a "file" or "volume" disk type which represents a cdrom or
>> +        floppy (the <code>device</code> attribute), it is possible to define
> 
> Stray change?

Oh right. I've realigned this area when adding the address description. 
But this change does not belong here.

> 
> Also, tn the test XML you need to "s/qemu-system-i686/qemu-system-i386/"
> or you'll hit a weird error. And VIR_TEST_REGENERATE_OUTPUT is also
> busted, see my patches elsewhere on this list.

Yeah, I've noticed Dan posted patches after these. I've fixed that 
locally but never replied to this patch. Sorry.

> 
> Reviewed-by: Cole Robinson <crobinso at redhat.com>

Thanks,
Michal




More information about the libvir-list mailing list