[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCH 20/24] hostdev: Save netdev configuration of actual device



On Tue, 2016-03-15 at 14:33 -0400, John Ferlan wrote:
> On 03/14/2016 02:00 PM, Andrea Bolognani wrote:
> > 
> > Word of warning, since this is very likely to get extremely
> > verbose: I fully expect to slip, here and there, and refer not
> > to the *current* situation as of this patch, but to the
> > *desired* situation at the end of the series.
> 
> I've rebased to top of tree after your recent pushes... Seems like a
> good place to reset focus!  I think I need a virtual whiteboard.

So, I've already replied to your comments about patch 22 yesterday
and patch 21 earlier today.

I believe a nice chunk of your concerns have been addressed by my
comments in the latter, so I'll mostly go quickly over them. Of
course let me know if I've glossed over something that was
actually a separate concern.

After I'm done with this reply I'll post a shortened series that
includes the commits that still haven't been pushed, reordering
and fixing them as needed to avoid the problems you noticed in
patches 20 and 21.

(I might decide to call it a day and go home between these two
events, it's getting kind of late already :)

> > Such *desired* situation is (not explicitly enough?) laid out
> > in patch 19, but I'll expand on it here for clarity:
> > 
> >   * the active list and inactive list contain the actual devices
> 
> I understand/agree with the goal for actual devices in each list. One
> mental block is usage of virPCIDeviceListAdd for activeList and
> virPCIDeviceListAddCopy for inactiveList.

This mental block is one of the things patch 22 fixes :)

> The second mental block is
> whether these are for only managed devices (more on that later).

This should have been clarified by my comments to patch 21.

> >   * each device is contained either in the active list, or in
> >     the inactive list. It can't be in both
> 
> Sure that's where you're headed, but with the current code this isn't
> necessarily true as a device could be on the inactiveList and activeList
> during patch 5, but that's the goal of patch 22.

Not sure about the reference to patch 5, but yes, this is the
*desired* situation, ie. how the code is supposed to behave after
the whole series has been applied. If we were already in the
desired situation, then we would need no patches ;)

> >   * virPCIDevice instances in 'pcidevs' are only used for
> >     looking up actual devices in the bookkeeping lists, contain
> >     no valuable data and are disposable at any time
> 
> This is where things start getting a bit fuzzy. I agree with the
> concept; however, the current implementation has a gotcha. Consider how
> VIR_APPEND_ELEMENT works. Initially 'pcidevs' is created as a new List,
> then for each 'hostdev' device, a new virPCIDevicePtr is created and
> added to the pcidevs list using virPCIDeviceListAdd. It's taken a bit to
> have VIR_APPEND_ELEMENT processing sink in for me ;-), but as I read the
> Prepare code it makes use of this processing model instead of creating a
> separate copy for the activeList.

So just to be clear, this is again about using virPCIDeviceListAdd()
when adding devices to the active list and virPCIDeviceListAddCopy()
when adding them to the inactive list, right? So one of the very
issues patch 22 is supposed to clear up :)

> > > First off - perhaps something for the previous documentation patch -
> > > PrepareDevices is called during the qemuProcessLaunch processing (eg
> > > domain startup) for all known host devices. It's also called during
> > > qemuDomainAttachHostPCIDevice or libxlDomainAttachHostPCIDevice
> > > (hotplug) for the *one* new device to be attached.
> > >  
> > > The purpose of the function is to take the passed hostdev list, move the
> > > devices to a managed active host device list for the guest. Devices that
> > > do not have the managed attribute are not processed.
> > That's not true, though: unmanaged devices *are* processed, even
> > if less operations are performed on them. The part about
> > detaching them from the host is skipped, because it's supposed
> > to have been performed by the user alread, but that's it;
> > everything else (resetting them, moving them to the appropriate
> > bookkeeping list, storing information about what driver and
> > domain is using them) is just the same as managed devices.
> > 
> I think this is where the confusion for me lies. I seem to have also
> mistyped a bit - perhaps it's the blurred lines between current and
> future perfect (hah!) state.
> 
> Initially an unmanaged device is not placed on the inactiveList (e.g.
> step2 in the PreparePCIDevices code), but it is placed on the activeList
> (step5).
> 
> Later on, we can place an unmanaged device on the inactiveList either in
> virHostdevReattachPCIDevice using the "stolen" 'pcidevs' entry during
> virHostdevReAttachPCIDevices or virHostdevPCINodeDeviceDetach where
> virPCIDeviceDetach uses a copy of the device. Although patch22 removes
> the addition into the inactiveList if !managed (new step5 and deletion
> of lines in ReattachPCIDevice).  Still leaves the NodeDeviceDetach path
> to understand from whence that request came.

This should have been addressed by the comments on patch 21.

> > > > >       /* Step 8: Now set the original states for hostdev def */
> > > > >       for (i = 0; i < nhostdevs; i++) {
> > > > > -        virPCIDevicePtr pci;
> > > > > +        virPCIDevicePtr actual;
> > > > >           virDomainHostdevDefPtr hostdev = hostdevs[i];
> > > > >           virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
> > > > >   
> > > > > @@ -659,24 +659,27 @@ virHostdevPreparePCIDevices(virHostdevManagerPtr mgr,
> > > > >           if (hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
> > > > >               continue;
> > > > >   
> > > > > -        pci = virPCIDeviceListFindByIDs(pcidevs,
> > > > > -                                        pcisrc->addr.domain,
> > > > > -                                        pcisrc->addr.bus,
> > > > > -                                        pcisrc->addr.slot,
> > > > > -                                        pcisrc->addr.function);
> > > > > +        /* We need to look up the actual device because it's the one
> > > > > +         * that contains the information we care about (unbind_from_stub,
> > > > > +         * remove_slot, reprobe) */
> > > > When a device goes from the inactivePCIHostdevs list to the
> > > > activePCIHostdevs list "at some point in time" in the future - does it
> > > > do a similar save?
> > >  
> > > I'll answer my own question - because this is the Prepare function and
> > > we don't have to worry about it since everything is on the active list.
> > Exactly, all devices that go from the inactive list to the active
> > list do so via this function... We don't need to handle this
> > anywhere else. And by the time this specific lookup is performed,
> > all devices have already been moved to the active list (step 5).
> > 
> Right, but if I read patch22 correctly (jumping slightly ahead), only
> managed devices would be placed onto the activeList. Currently all
> pcidevs are moved to activeList, but your change there moves from
> inactiveList to activeList, where AFAICT the inactiveList only has
> managed devices.
> 
> See my conundrum now?  The good news is the fridge is stocked with cold
> ones to help me ;-)

Again, comments on patch 21. And for some reason I kinda feel
thirsty now, weird ;)

> > > Before patch 15, step 3 would take any of the devices we removed from
> > > the activeList and perform a reset on them.
> > >  
> > > After patch 15, step 3 would do a similar function assuming that no
> > > device could not be on neither list.
> > NO device could NOT be on NEITHER list... My head is spinning :)
> > 
> > Actually, when we get here the devices are *guaranteed* to be in
> > neither list, because we removed them from the active list in
> > step 1. But that was the case even before patch 15, so it should
> > not be a problem.
> > 
> > > 
> > > Hopefully this all makes sense - I keep reading it and going back and
> > > fort between old and new code.  The comment left in the code after step
> > > 1 is most helpful...
> > I know how you feel! Thanks for sticking with it, hopefully the
> > comments I've provided so far (especially the ones at the top of
> > this message) are helpful in understanding the thought process
> > behind the changes.
> > 
> > I'll get back to you with more comments tomorrow - bet you can't
> > wait for them! ;)
> No good deed goes unpunished ;-):
> 
> 
> As I read the current PreparePCIDevices code:
> 
> Step1: Walk pcidevs, make some VFIO/ACS check... Make some PCI Node
> Device check... assume everything works as documented ;-)
> 
> Step2: Walk pcidevs, if managed, call virPCIDeviceDetach to place *copy*
> of pci onto the inactive list, else VIR_DEBUG message
> 
> Step3: Walk pcidevs, call virPCIDeviceReset on all devices (both those
> managed in inactiveList and those that are not managed)
> 
> Step4: Walk hostdevs, SRIOV specific to replace network config (close my
> eyes, pray this works)
> 
> Step5: Walk pcidevs, *adding* each device to activeList,. Since not
> copy, then device is on activeList *and* pcidevs at the same time,
> right? Step9 will "steal" from pcidevs to ensure virObjectUnref of
> pcidevs doesn't have virPCIDeviceListDispose free what's in activeList.
> (This is a key point...)
> 
> Step6: Walk pcidevs, remove pci from inactiveList (free all memory of
> the copy initially placed on inactiveList).
> 
> Step7: Walk pcidevs, if device found on activeList, call SetUsedBy
> 
> Step8: Walk hostdevs, if device on pcidevs, then set unbind_from_stub,
> remove_slot, and reprobe (since everything on pcidevs is also in the
> activeList - magically the activeList data gets updated).
> 
> Step9: Walk pcidevs, Stealing each element off of pcidevs (but doesn't
> virPCIDeviceFree the element so that virObjectUnref(pcidevs) won't find
> anything and do the free - because that element is on the activeList!).
> 
> My notes (or what keeps bouncing around while working through this):
> 
> 1. During the Prepare step, activeList add is not a copy, it uses
> VIR_APPEND_ELEMENT. IIUC, this means if 'dev' was dereferenced in
> virPCIDeviceListAdd after the APPEND, then there'd be a problem since
> dev would be set to NULL after the macro; however, since 'dev' is passed
> by value, that means the caller (e.g. the Prepare code) still can
> reference the element - it's just that the virPCIDevicePtr is listed in
> *two places*.

Exactly, and doing so causes nothing but grief - which is why
patch 22 gets rid of this behaviour completely.

> You'll note other places in the code will Steal from one
> list and place on another, but this code doesn't do that until step9
> where it steals all the pcidevs entries into apparently vaporware.
> "Theoretically" we really shouldn't reference activeList elements in
> pcidevs, but if we do reference them, then "magically" (so to speak) the
> activeList element would be updated. Personally it may be easier to use
> a *copy* for the activeList too and then let virObjectUnref handle the
> pcidevs entry deletion.

Claiming that I completely agree is an understatement!

> 2. During Prepare, inactiveList add is done by copy (although one could
> argue that virPCIDeviceListAddCopy could be replaced by
> VIR_APPEND_ELEMENT_COPY I suppose). So now there are *two* copies of the
> same device - this is good and again, I think the model the activeList
> should use.

Yes!

> 3. Current step9 just "steals" (or VIR_DELETE_ELEMENT) the element from
> pcidevs. This is "fine" since it's already/also on the activeList.
> You'll note other users of the macro would unref/free whatever memory
> was contained in the structure prior to removing from the list. Good
> thing we don't do that here ;-).

As noted previously, the current code is correct, just *extremely*
hard to get right! So kudos to whoever managed such a feat :)

> If we don't steal the entry, then
> virObjectUnref(pcidevs) will call virPCIDeviceListDispose for us - your
> patch22 change to remove this may not be right...

I believe I've already explained this in detail in my reply to your
comments on patch 22, so I won't repeat myself here. If you still
think this is unclear, just let me know.

> So given the current state of things, here's my view on the changes
> assuming my VIR_APPEND_ELEMENT viewpoint:
> 
> Change #1: (what';s in the commit message)
> Alters step8 to use activeList instead of pcidevs. This part looks
> correct (perhaps worthy of it's own ACKable patch).
> 
> Change #2: (not in the commit message)
> You jump into virHostdevReAttachPCIDevices and I then have to focus on a
> different algorithm and step process. Here's where I definitely believe
> you had patch22 in mind.

Yeah, I've already confirmed that was the case :)

> But focusing on the code as it exists now... This code walks the
> hostdev's and will find anything on the pcidevs list which was on the
> former activeList and ensures the NetConfigRestore is run (e.g. the
> corollary of step4 in Prepare).
> 
> Your change to ensure the device is on the inactiveList doesn't seem
> right at this point in time of the code. When you insert step2 in
> patch22, it would perhaps be more correct...

Yup.

> Without peeking ahead too much, I think the insert unmanaged device onto
> inactiveList done in virHostdevReattachPCIDevice is incorrect at this
> point. Even if not incorrect, it's confusing.

No, you're right, it's actually incorrect.

> Thinking in terms of current code - step4 will take the everything that
> was formerly on the active list and:
> 
> 1. Steal the pcidevs entry
> 
> 2. Call virHostdevReattachPCIDevice with the stolen lookaside pcidevs
> entry...  While it appears to be an actual, it isn't. Of course, I think
> you address that in patch 22, but trying not to get too far ahead.
> 
>    2a. If the device is not managed, then add to the inactiveList and
> return (call virPCIDeviceFree if unsuccessfully add). What really
> confuses me here is why we place onto the inactiveList if not managed;
> however, you address that in patch22. Still it could be addressed
> earlier if that is a separate bug...

I would have to go back to patch 22 to make sure this is really
handled correctly there: I'm pretty confident it is, but I've
been wrong before! Most recently, while writing this series :(

The incorrect behaviour you've noticed before has actually been
introduced by patch 15. Before that, 'pcidevs' in
virHostdevReAttachPCIDevices() was obtained by copying instances
off the active list, hence virHostdevReattachPCIDevice() would
really be passed (a copy of) an actual; now that's no longer the
case, and the code has become incorrect.

Will fix as part of the respin.

>    2b. If the device is managed, call virPCIDeviceReattach
> 
>       2b1. Fail if on activeList (miserable)
>       2b2. Unbind stub
>       2b3. Call virPCIDeviceListDel
>         -> Steal from inactiveList (search inactiveList for matching
> device by domain address, bus slot of the passed lookside pci device)
>         -> call virPCIDeviceFree on the inactiveList entry
> 
>    2c. Call virPCIDeviceFree on the lookaside list entry.
> 
> 3. virObjectUnref(pcidevs)
> 
>   -> If I read the code right - each entry was stolen and properly
> removed so the virPCIDeviceListDispose has nothing to do.
> 
> So it seems at this point, I would surmise virPCIDeviceReattach expects
> the pcidevs lookaside entry.

Nope, definitely wants the actual device.

> The problem being insertion onto the
> inactiveList of an unmanaged device (something we avoided in Prepare).

Well, unmanaged devices are supposed to end up in the inactive
list at the end of this function... They will be removed from
there by virHostdevPCINodeDeviceReAttach().

> And yes, I'm still curious about virHostdevPCINodeDeviceDetach and how
> that plays into things.

What exactly is confusing you about that function?


So well, it's almost dinner time here, respin's coming tomorrow
I guess :)

Cheers.

-- 
Andrea Bolognani
Software Engineer - Virtualization Team


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]