[libvirt] [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices

Wed Dec 5 17:09:16 UTC 2018

On Wed, Dec 05, 2018 at 10:18:29 -0600, Michael Roth wrote:
> Quoting Sameeh Jubran (2018-10-25 13:01:10)
> > On Thu, Oct 25, 2018 at 5:06 PM Sameeh Jubran <sameeh at daynix.com> wrote:
> > > From: Sameeh Jubran <sjubran at redhat.com>
> > > Migration support:
> > >
> > > Pre migration or during setup phase of the migration we should send an
> > > unplug request to the guest to unplug the primary device. I haven't had
> > > the chance to implement that part yet but should do soon. Do you know
> > > what's the best approach to do so? I wanted to have a callback to the
> > > virtio-net device which tries to send an unplug request to the guest and
> > > if succeeds then the migration continues. It needs to handle the case where
> > > the migration fails and then it has to replug the primary device back.
> > I think that the "add_migration_state_change_notifier" API call can be used
> > from within the virtio-net device to achieve this, what do you think?
> 
> I think it would be good to hear from the libvirt folks (on Cc:) on this as
> having QEMU unplug a device without libvirt's involvement seems like it
> could cause issues. Personally I think it seems cleaner to just have QEMU
> handle the 'hidden' aspects of the device and leave it to QMP/libvirt to do
> the unplug beforehand. On the libvirt side I could imagine adding an option
> like virsh migrate --switch-to-standby-networking or something along
> that line to do it automatically (if we decide doing it automatically is
> even needed on that end).

I remember talking about this approach some time ago.

In general the migration itself is a very complex process which has too
many places where it can fail. The same applies to device hotunplug.
This series proposes to merge those two together into an even more
complex behemoth.

Few scenarios which don't have clear solution come into my mind:
- Since unplug request time is actually unbounded. The guest OS may
  arbitrarily reject it or execute it at any later time, migration may get
  stuck in a halfway state without any clear rollback or failure scenario.

- After migration, device hotplug may fail for whatever reason, leaving
  networking crippled and again no clear single-case rollback scenario.

Then there's stuff which requires libvirt/management cooperation
- picking of the network device on destination
- making sure that the device is present etc.

From managements point of view, bundling all this together is really not
a good idea since it creates a very big matrix of failure scenarios. In
general even libvirt will prefer that upper layer management drives this
externally, since any rolback scenario will result in a policy decision
of what to do in certain cases, and what timeouts to pick.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20181205/6dc298ce/attachment-0001.sig>