[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH v1 1/3] adding unplug_timeout QEMU conf

On 8/30/19 8:15 AM, Christophe de Dinechin wrote:
Daniel P. Berrangé writes:

The reason for this timeout is that we originally promised something
that we cannot deliver - a synchronous device detach API, while the
operation itself is asynchronous. I'm not a fan of exposing it and
making it configurable.
I'm especially *not* a fan because the commit messages says this is
a problem on certain architectures. Since we know what those arches
are, we should use a larger timeout for those arches out of the box.
Requiring admin to set a config param to fix the architectures is
super unpleasant out of the box experiance.
True, but also notice that 5 seconds is also already close to the
attention span time limit for users [1]. So increasing it to 10s might
bring people to believe things are stuck, unless you provide some sort
of feedback that this is normal.


Interesting link, thanks.

About the user feedback due to long response delay: we're already
breaking this with the setvcpus command, at least with PowerPC
guests and a lot of vcpus being unplugged. Here's an example in
which I am able to complete the command without kicking the
timeout error (guest is idle, vcpu unplug is fast in this case):

--- guest booted with 1 vcpu - added 39 extra vcpus. Operation takes
a second, it's fast  ---

[danielhb kop5 libvirt]$ sudo ./run tools/virsh setvcpus vcpus-test 40 --live

[danielhb kop5 libvirt]$
[danielhb kop5 libvirt]$

--- removing them back ----

[danielhb kop5 libvirt]$ time sudo ./run tools/virsh setvcpus vcpus-test 1 --live

real    0m21.695s
user    0m0.119s
sys    0m0.000s
[danielhb kop5 libvirt]$

This happens in PowerPC because the timeout is being considered not for the
whole operation, but per device. Since I'm unplugging 39 devices and the
5 seconds timeout is refreshed for every operation, in theory the user
can wait close to 39*5  seconds with the terminal frozen.

Now, if we are to adhere to such UX standards (IMO, we should), I propose
the following:

- short term: increase PowerPC timeout to 10 seconds per device. Following
the UX guideline above, this is the limit we can go without warning the
user about the delay;

- short term: for PowerPC guests, tune 'setvcpus' message to warn the user
that the operation can take some time to complete;

These 2 are simples changes and I can get it done for the next release
without too much trouble.

- mid/long term: I can look into the PowerPC guest implementation, see if there
are device_del events being fired up in QMP and implement a better UX with
more information about how the process is going. Something like "vcpu 1 out
of 30 unplugged", "vcpu 2 out of 30 unplugged", or a progress bar, or
whatever makes more sense to give the user a feeling of operation ongoing.

Note that I'm suggesting PowerPC only changes due to what Daniel said
earlier - we can't impact other users due to something that, at first glance,
only PowerPC does different. I have a hunch that we should do for all
archs, but I can't defend this claim without testing this in x86 at least.
These short-term changes are easy to make it across the board though,so it's
just a matter of removing "if PowerPC" in these changes.

What do you think?



|:https://berrange.com       -o-https://www.flickr.com/photos/dberrange  :|
|:https://libvirt.org          -o-https://fstop138.berrange.com  :|
|:https://entangle-photo.org     -o-https://www.instagram.com/dberrange  :|
Christophe de Dinechin (IRC c3d)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]