[libvirt] [PATCH v1 1/3] adding unplug_timeout QEMU conf

Daniel Henrique Barboza danielhb413 at gmail.com
Mon Aug 19 10:07:38 UTC 2019


Hi,

On 8/19/19 6:16 AM, Ján Tomko wrote:
> On Sun, Aug 18, 2019 at 06:45:29PM -0300, Daniel Henrique Barboza wrote:
>> For some architectures and setups, device removal can take
>> longer than the default 5 seconds. This results in commands
>> such as 'virsh setvcpus' to fire timeout messages even if
>> the actual operation happened in the guest, causing confusion
>> for the user.
>>
>
> The commit that introduced this error message:
>    commit e3229f6e4461cd1721dc68a32e16ab1718ae716e
>        qemu: hotplug: Add support for VCPU unplug
>
> specifically says that we treat this differently than regular device
> detach:
>
>    As the new code is using device_del all the implications of using it
>    are present. Contrary to the device deletion code, the vcpu deletion
>    code fails if the unplug request is not executed in time.
>
> Technically, we already did execute the unplug request so we lie to the
> user saying "operation failed".
>
> Maybe we can revisit the decision? [cc-ing pkrempa who added this]
>


I have thought about making setvcpus asynchronous when it is a
device_del operation. This would be more code (perhaps a new command
to do that? Or a --asynchronous option?), and we can set user expectations
properly by making it clear that this is a unplug request command, and the
user will need to check the result personally in the guest.

But then, if we are prepared to tell the user "go check yourself" we can 
simply
change the current timeout message to say "vcpu unplug [...] timeout,
check unplug status in the guest". This would be clearer, and the user
wouldn't  automatically assume that timeout == operation failed.

Another thing we can do, instead of exposing the option to the user (which
has a good potential for disaster - hence why I added a minimal value), is
to simply set the timeout to a greater value (10? 15?) and be done with it.
If we set the timeout to 15 seconds and change the timeout message to
let the user know "we don't know, you'll need to check", like I said above,
we have more resilience and will not alarm the user if a timeout still 
occurs.
We'll also avoid exposing the timeout configuration like I'm doing here.


>> This patch adds a new qemu.conf parameter called 'unplug_timeout'
>> to handle these cases. If left unset, the current default
>> timeout is used. To avoid user 'experimentation' with small
>> timeouts, the current timeout is also the minimal value
>> allowed.
>>
>
> The reason for this timeout is that we originally promised something
> that we cannot deliver - a synchronous device detach API, while the
> operation itself is asynchronous. I'm not a fan of exposing it and
> making it configurable.
>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
>> ---
>> src/qemu/libvirtd_qemu.aug         |  3 +++
>> src/qemu/qemu.conf                 |  4 ++++
>> src/qemu/qemu_conf.c               | 26 ++++++++++++++++++++++++++
>> src/qemu/qemu_conf.h               |  5 +++++
>> src/qemu/qemu_driver.c             |  2 ++
>> src/qemu/test_libvirtd_qemu.aug.in |  1 +
>> 6 files changed, 41 insertions(+)
>>
>
> [...]
>
>> diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h
>> index 0cbddd7a9c..29824e4e35 100644
>> --- a/src/qemu/qemu_conf.h
>> +++ b/src/qemu/qemu_conf.h
>> @@ -214,6 +214,8 @@ struct _virQEMUDriverConfig {
>>     gid_t swtpm_group;
>>
>>     char **capabilityfilters;
>> +
>> +    unsigned int unplugTimeout;
>> };
>>
>> /* Main driver state */
>> @@ -294,6 +296,9 @@ struct _virQEMUDriver {
>>
>>     /* Immutable pointer, self-locking APIs */
>>     virHashAtomicPtr migrationErrors;
>> +
>> +    /* Immutable value */
>> +    unsigned int unplugTimeout;
>> };
>>
>
> Why store this value twice?

I wanted the value to be available at the driver object, but saw that the
parsing of the reading file put stuff in config.

However, just realized that we get to the cfg via qemu_driver->config
(as long as the lock is being held, which I think it's the case for all 
unplug
operations).

In case we still want this configurable timeout solution, I'll fix this 
in the
next spin.


Thanks,


DHB


>
> Jano




More information about the libvir-list mailing list