[libvirt] [PATCH] virhostdev: Fix PCI devices are still attatched to stub driver bug

John Ferlan jferlan at redhat.com
Thu Sep 20 15:22:16 UTC 2018



On 09/20/2018 06:35 AM, Wuzongyong (Euler Dept) wrote:
>> $SUBJ:
>>
>> s/attatched/attached
>>
>> s/bug//
>>
>> On 08/31/2018 03:34 AM, Wu Zongyong wrote:
>>
>> So, first off - I believe there are two things going on in this one patch.
>> Even though there is "some relationship" between the issues, the libvirtd
>> restart is kind of a corner case, while the change to readd the device to
>> inactiveDev's would affect both this libvirtd restart case
>> *and* the normal processing. It's taken me some time to come to that
>> conclusion - lots of details follow, but I'm also willing to accept the
>> changes are "related enough" for just one patch.
>>
>>> Currently, PCI devices will not be rebound to host drivers but
>>> attached to the stub driver when:
>>>
>>> 1) use libvirt to start a virtual machine with PCI devices assigned,
>>> then stop libvirtd process and shutdown the virtual machine. Finally,
>>> PCI devices are still bound to the stub driver instead of host drivers
>>> after libvirt start again.
>>> 2) use libvirt to shutdown a virtual machine wtih PCI devices
>>> assigned,
>>
>> s/wiht/with
>>
>>> then stop libvirtd process before libvirt try to rebind PCI devices to
>>
>> Be specific which API isn't being called - it really will help. Making a
>> reviewer figure it out in very specific/particular cases such as this
>> leads to "review delay". Is qemuHostdevReAttachDomainDevices called from
>> qemuProcessStop what you were referring to? The Stop not getting called
>> because the monitor channel is closed and thus the event for shutdown
>> won't be triggered.
>>
>>> host drivers. Finally, PCI devices are still bound to the stub driver
>>> after libvirt start again.
>>
>> Since you've done the research and so that I'm sure I'm following your
>> logic - it seems like you're chasing corner cases based on libvirtd
>> restart. In one instance the order is:
>>
>>     virsh start $dom
>>     service libvirtd stop  (or some other means?)
>>     <shutdown> the guest from within the $dom
>>
>> and the other instance is
>>
>>     virsh start @dom
>>     virsh destroy $dom
>>     service libvirtd stop
>>
>> and issues as a result of the subsequent "service libvirtd start".
>>
>> In the second instance it's not clear how your timing is "just right"
>> such that libvirtd can be stopped before the PCI devices can be returned
>> to the host PCI, but it's essentially the same as the first scenario at
>> least with respect to the subsequent libvirtd restart and discovery that
>> the guest is no longer running and needing to process that.
>>
>> What may also have helped is if the commit message indicated the "normal"
>> path taken that would allow the drivers to be rebound properly.
>>  Even if it's after the "---" for review backup assistance information.
>>
>> In edge cases like this, it's far easier to cut out stuff from a commit
>> message than it is to put yourself into the frame of reference of the
>> patch submitter
>>
>> Still what your patch does is use the virDomainHostdevOrigStatesPtr for a
>> domain that was shutdown, but when libvirtd restarts. I had to go a long
>> way back to find when the OrigStates code was added - commit
>> d84b36263 in particular. In that case, the order was start domain, restart
>> libvirtd, and destroy domain.
>>
>> So the slight difference to the condition for this patch is the domain was
>> shutdown while libvirtd was stopped.
>>
>> In any case, all of the above is "one issue"...
>>
>> Leaving the rest as a "second issue"
>>
>>>
>>> Notice that the comment on the top of virPCIDeviceDetach as follows:
>>>
>>> activeDevs should be a list of all PCI devices currently in use by a
>>> domain.inactiveDevs is a list of all PCI devices that libvirt has
>>
>>   * activeDevs is a list ...
>>   * inactiveDevs is a list ...
>>
>>> detached from the host driver + attached to the stub driver, but
>>> hasn't yet assigned to a domain.
>>>
>>> It's not reasonable that libvirt filter out devices that are either
>>> not active or not used by the current domain and driver. For devices
>>> belong
>>
>> s/belong/that belong/
>>
>>> to domains that has been shutdown before libvirt start, we should put
>>
>> s/has/have
>>
>> s/libvirt start, we should put them/libvirtd starts, they should be placed
>>
>>> them into inactiveDevs if then meet the condition that PCI devices
>>> have
>>
>> I'm finding it hard to read/parse/rewrite the rest of this sentence.
>>
>>> been detached from the host driver + attached to the stub driver.
>>>
>>> Moreover, we set orignal states of PCI devices when build PCI devices
>>
>> s/we set orignal/set the original/
>>
>> s/build/building/
>>
>>> list in virHostdevGetPCIHostDeviceList if states has been set in
>>> struct virDomainHostdevDefPtr.
>>
>>
>> So the "second issue" (as I see it) is that ReAttach processing wasn't
>> handling returning devices that had stub driver attachment (at some point
>> in time). This issue occurs regardless of the libvirtd restart and domain
>> shutdown processing order. Even though it's related, it's different and
>> would need its own patch.
>>
>>>
>>> Signed-off-by: Wu Zongyong <cordius.wu at huawei.com>
>>> ---
>>>  src/util/virhostdev.c | 20 ++++++++++++++++++--
>>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c index
>>> ca79c37..ecf95e3 100644
>>> --- a/src/util/virhostdev.c
>>> +++ b/src/util/virhostdev.c
>>> @@ -235,6 +235,7 @@
>> virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int
>> nhostdevs)
>>>      for (i = 0; i < nhostdevs; i++) {
>>>          virDomainHostdevDefPtr hostdev = hostdevs[i];
>>>          virDomainHostdevSubsysPCIPtr pcisrc =
>>> &hostdev->source.subsys.u.pci;
>>> +        virDomainHostdevOrigStatesPtr origstates =
>>> + &hostdev->origstates;
>>>          virPCIDevicePtr pci;
>>>
>>>          if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) @@
>>> -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr
>> *hostdevs, int nhostdevs)
>>>              virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN);
>>>          else
>>>              virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM);
>>> +
>>> +        virPCIDeviceSetUnbindFromStub(pci, origstates-
>>> states.pci.unbind_from_stub);
>>> +        virPCIDeviceSetRemoveSlot(pci, origstates-
>>> states.pci.remove_slot);
>>> +        virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe);
>>
>> This looks a lot like what virHostdevUpdateActivePCIDevices is doing - in
>> fact I think that is the libvirtd restart "bug" you're chasing -
>> qemuProcessReconnect will call qemuHostdevUpdateActiveDomainDevices
>> after the call to qemuConnectMonitor - at least w/r/t this PCI stub
>> interaction.
>>
>> Since the domain was stopped connecting to the monitor will fail (in my
>> case qemuMonitorOpenUnix), thus falling into qemuProcessStop, but without
>> having updated (or ReAttach'ing) the PCI devices. So, in your environment
>> without these two changes - could you check to see what happens if the
>> order is changed? Does that fix the issue?
>>
>> That is move the call to qemuHostdevUpdateActiveDomainDevices to right
>> after qemuProcessPrepareAllowReboot.
>>
>> Some more "detailed" thoughts I had on this while investigating...
>>
>> Both virHostdevPreparePCIDevices and virHostdevReAttachPCIDevices will
>> call virHostdevGetPCIHostDeviceList. However, in the case of the former,
>> "Step 7" is "set the original states for hostdev def".
>>
>> Following call traces for virHostdevPreparePCIDevices eventually leads
>> back to :
>>
>>   1. qemuMigrationDstPrepareAny (migration)
>>   2. qemuProcessStart (guest startup)
>>   3. qemuDomainAttachHostDevice (hotplug).
>>
>> IOW: This is the "normal" running mode.
>>
>> So I have a concern regarding the "order of operation" in this case
>> especially since step 7 has the caveat of (actual) being true before
>> setting the bools.
>>
>> Considering the "other pah" through virHostdevReAttachPCIDevices, callers
>> are:
>>
>>    1. libxlDomainAttachHostPCIDevice (an error: path)
>>    2. libxlDomainDetachHostPCIDevice (normal path)
>>    3. qemuHostdevReAttachPCIDevices (normal path)
>>    4. virHostdevReAttachDomainDevices (libxl domain close processing)
>>
>> Focusing on qemuHostdevReAttachPCIDevices I find 3 callers:
>>
>>    1. qemuHostdevReAttachDomainDevices (called by qemuProcessStop)
>>    2. qemuDomainAttachHostPCIDevice (hotplug error processing)
>>    3. qemuDomainRemovePCIHostDevice (hot unplug processing).
>>
>> So while setting things up seems to be the right thing to do in the
>> ReAttach case, the normal attach case it doesn't seem so. Perhaps too
>> generic.
>>
>>>      }
>>>
>>>      return pcidevs;
>>> @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr
>> mgr,
>>>                  continue;
>>>              }
>>>          } else {
>>> -            virPCIDeviceListDel(pcidevs, pci);
>>> -            continue;
>>
>> The above was added in commit 4cdbff3d5 by Andrea and just removed
>> inactiveDev's from the @pcidevs since his change added both onto @pcidevs
>> (at least that's my reading of it).
>>
>> But with your proposed change rather than deleting that from @pcidevs,
>> you're moving it to the inactiveDev's list *if* there's a stub.
>>
>> So what happens if there isn't a stub?  From this point forward @pcidevs
>> would contain both active and inactive devices.
>>
>> Then step 2 comes along and runs the @pcidevs "stealing" to place the
>> device on the inactiveDevs list.  So now wouldn't we have duplicate
>> devices there?
>>
>> Now if the new code went before the above 2 lines, then probably no big
>> deal - that way we're adding back to the inactive list for these stub
>> drivers and we're not keeping the device in @pcidevs meaning it wouldn't
>> be readded to inactive afterwards.
>>
>> Hopefully this makes sense.
>>
>>> +            int stub = virPCIDeviceGetStubDriver(pci);
>>> +            if (stub > VIR_PCI_STUB_DRIVER_NONE &&
>>> +                stub < VIR_PCI_STUB_DRIVER_LAST) {
>>> +                /* The device is bound to a known stub driver: add a
>> copy
>>> +                 * to the inactive list */
>>> +                VIR_DEBUG("Adding PCI device %s to inactive list",
>>
>> s/%s/'%s'/
>>
>> (both above and below)
>>
>>> +                          virPCIDeviceGetName(pci));
>>> +                if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs,
>> pci) < 0) {
>>> +                    VIR_ERROR(_("Failed to add PCI device %s to the
>> inactive list"),
>>> +                              virGetLastErrorMessage());
>>> +                    virResetLastError();
>>> +                }
>>> +            }
>>
>> For reference, this is a copy of what's in virHostdevPreparePCIDevices
>> step 2. I think you need to do some extra debugging and be sure you're not
>> duplicating things.  IOW: Add some debug code to dump the active and
>> inactive lists.
>>
>> Although this is really long - I hope it all makes sense.
>>
>> John
> Very sorry for the late, I have been occupied with a new project these days. Very sorry.
> Also thanks for correcting my poor English.

No problem... We're all busy in different ways. The only hazard in
delayed followups for me is short term memory problems ;-). The
grammar/spelling is just part of the process - even people who are
native english speakers aren't always "good" at it.

> 
> Back to the question, you suggested to move the call to qemuHostdevUpdateActiveDomainDevices
> to right after qemuProcessPrepareAllowReboot.
> Indeed, adjusting the order can fix the problem. So, maybe we don't need to talk other aspect
> You mentioned? All you mentioned really make sense and I don't consider so much indeed.
> 
> Thanks,
> Zongyong Wu
> 
> 

If moving the call works, then send a patch with just that!  I thought
"logistically" it would work, but didn't have the empirical evidence.

If the "second issue" is no longer an issue, then fine we can just drop
and forget about it.

John




More information about the libvir-list mailing list