[libvirt] [PATCH v2 4/5] qemu_hotplug: Fix a rare race condition when detaching a device twice

Michal Privoznik mprivozn at redhat.com
Thu Mar 14 14:31:48 UTC 2019


On 3/14/19 3:14 PM, Peter Krempa wrote:
> On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote:
>> On 3/14/19 2:18 PM, Peter Krempa wrote:
>>> On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote:
> 
> [...]
> 
>>>
>>> How can this be considered success? Also this introduces a possible
>>> regression. The DEVICE_DELETED event should be fired only after the
>>> device was entirely unplugged. Claiming success before seeing the event
>>> can lead to another race when qemu deleted the device from the internal
>>> list so that 'device_del' does not see it any more but did not finish
>>> cleanup fully.
>>>
>>> We need to start the '*Remove' handler only after the DEVICE_DELETED
>>> event was received.
>>
>> I beg to differ. If we were to report error here users would see the API
>> failing with error "Device not found". So they'd run 'virsh dumpxml' only to
>> find the device there. I don't find such behaviour sane. If one API tells me
>> a devie is not there then another one shall not tell otherwise.
> 
> Well. The user semantics can be confusing here. What we can't allow
> though is that some of the steps done in the qemuDomainRemove*Device
> will fail because qemu will still have some internal reference to some
> backend object.

I'm not quite sure I follow. qemuDomainRemove*Device will be run exactly 
once. Not any more times. Running it more times is a problem, but I'm 
failing to see how my patch allows that. Can you shed more light into 
that please?

> What I'd find more of a problem is that I'd try to
> attach a similar device only to be told that it already exists.

I'm don't know what you mean here either. With my patches not only we 
enter the wait for the event again (thus widening the window when the 
event may arrive), but we are actually compliant with the detach 
semantics. Let's think of an extreme case: qemu fails to deliver 
DEVICE_DELETED event. With my patches you'll get:

1) virsh detach-device-alias $dom $alias
Device detach request sent successfully

2) virsh detach-device-alias $dom $alias
Device detach request sent successfully

3) virsh detach-device-alias $dom $alias
Device detach request sent successfully
  ...

If we were to fail, as you suggest:
1) virsh detach-device-alias $dom $alias
Device detach request sent successfully

2) virsh detach-device-alias $dom $alias
monitor error: DeviceNotFound

3) virsh detach-device-alias $dom $alias
monitor error: DeviceNotFound


Now if you run 'virsh dumpxl $dom' as 4th step (for both scenarios) the 
device is still there. So how can it be in the domain XML and not found 
at the same time? And if you try to attach it, everything will work: 
libvirt generates a different address to plug the device to, since it 
still sees the old one.

Michal




More information about the libvir-list mailing list