[libvirt] [RFC] finegrained disk driver options control

Thu Mar 16 15:52:07 UTC 2017

On Thu, Mar 16, 2017 at 04:35:36PM +0100, Kevin Wolf wrote:
> Am 16.03.2017 um 16:08 hat Daniel P. Berrange geschrieben:
> > On Thu, Mar 16, 2017 at 06:00:46PM +0300, Denis V. Lunev wrote:
> > > On 03/16/2017 05:45 PM, Daniel P. Berrange wrote:
> > > > On Thu, Mar 16, 2017 at 05:08:57PM +0300, Denis V. Lunev wrote:
> > > >> Hello, All!
> > > >>
> > > >> There is a problem in the current libvirt implementation. domain.xml
> > > >> allows to specify only basic set of options, especially in the case
> > > >> of QEMU, when there are really a lot of tweaks in format drivers.
> > > >> Most likely these options will never be supported in a good way
> > > >> in libvirt as recognizable entities.
> > > >>
> > > >> Right now in order to debug libvirt QEMU VM in production I am using
> > > >> very strange approach:
> > > >> - disk section of domain XML is removed
> > > >> - exact command line options to start the disk are specified at the end
> > > >>   of domain.xml whithin <qemu:commandline> as described by Stefan
> > > >>  
> > > >> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> > > >>
> > > >> The problem is that when debug is finished and viable combinations of
> > > >> options is found I can not drop VM in such state in the production. This
> > > >> is the pain and problem. For example, I have spend 3 days with the
> > > >> VM of one customer which blames us for slow IO in the guest. I have
> > > >> found very good combination of non-standard options which increases
> > > >> disk performance 5 times (not 5%). Currently I can not put this combination
> > > >> in the production as libvirt does not see the disk.
> > > >>
> > > >> I propose to do very simple thing, may be I am not the first one here,
> > > >> but it would be nice to allow to pass arbitrary option to the QEMU
> > > >> command line. This could be done in a very generic way if we will
> > > >> allow to specify additional options inside <driver> section like this:
> > > >>
> > > >>     <disk type='file' device='disk'>
> > > >>       <driver name='qemu' type='qcow2' cache='none' io='native'
> > > >> iothread='1'>
> > > >>           <option name='l2-cache-size' value='64M/>
> > > >>           <option name='cache-clean-interval' value='32'/>
> > > >>       </driver>
> > > >>       <source file='/var/lib/libvirt/images/rhel7.qcow2'/>
> > > >>       <target dev='sda' bus='scsi'/>
> > > >>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
> > > >>     </disk>
> > > >>
> > > >> and so on. The meaning (at least for QEMU) is quite simple -
> > > >> these options will just be added to the end of the -drive command
> > > >> line. The meaning for other drivers should be the same and I
> > > >> think that there are ways to pass generic options in them.
> > > > It is a general policy that we do *not* do generic option passthrough
> > > > in this kind of manner. We always want to represent concepts explicitly
> > > > with named attributes, so that if 2 hypervisors support the same concept
> > > > we can map it the same way in the XML
> > >
> > > OK. How could I change L2 cache size for QCOW2 image?
> > > 
> > > For 1 Tb disk, fragmented in guest, the performance loss is
> > > around 10 times. 10 TIMES. 1000%. The customer could not
> > > wait until proper fix in the next QEMU release especially
> > > if we are able to provide the kludge specifically for him.
> > 
> > We can explicitly allow L2 cache size set in the XML but that
> > is a pretty poor solution to the problem IMHO, as the mgmt
> > application has no apriori knowledge of whether a particular
> > cache size is going to be right for a particular QCow2 image.
> > 
> > For a sustainable solution, IMHO this really needs to be fixed
> > in QEMU so it has either a more appropriate default, or if a
> > single default is not possible, have QEMU auto-tune its cache
> > size dynamically to suit the characteristics of the qcow2 image.
> 
> A tradeoff between memory usage and performance is policy, and setting
> policy is the management layer's job, no qemu's. We can try to provide
> good defaults, but they are meant for manual users of qemu. libvirt is
> expected to configure everything exactly as it wants it instead of
> relying on defaults.

The question though is how is an app supposed to figure out what the
optimal setting for cache size is ?  It seems to require knowledge
of the level of disk fragmentation and guest I/O patterns, neither
of which are things we can know upfront. Which means any atttempt to
set cache size is little more than ill-informed guesswork

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|