[libvirt] [PATCH 1/2] qemu: don't kill qemu process on restart if networkNotify fails

Laine Stump laine at laine.org
Wed Apr 26 21:43:00 UTC 2017


On 04/26/2017 02:39 PM, John Ferlan wrote:
> 
> 
> On 04/25/2017 12:33 PM, Laine Stump wrote:
>> Nothing that could happen during networkNotifyActualDevice() could
>> justify unceremoniously killing the qemu process, but that's what we
>> were doing.
> 
> So that galaxy in far far away land circa commit id '04711a0f' when all
> this was added -

Sure, in 2011. I think the statute of limitations has passed on that
though (actually I'm surprised there's still any evidence of it in the
git blame, what with all the code reorganization etc going on - that
tends to obscure meaningful/functional changes...)

> 
>>
>> In particular, new code added in commit 85bcc022 (first appearred in
> 
> appeared
> 
>> libvirt-3.2.0) attempts to reattach tap devices to their assigned
>> bridge devices when libvirtd restarts (to make it easier to recover
>> from a restart of a libvirt network). But if the network has been
>> stopped and *not* restarted, the bridge device won't exist and
>> networkNotifyActualDevice() will fail.
>>
>> This patch changes qemuProcessNotifyNets() to return void, so that
>> qemuProcessReconnect() will soldier on regardless of what happens (any
>> errors will still be logged, but we won't terminate the guest).
>>
>> Partially resolves: https://bugzilla.redhat.com/1442700
>> ---
>>
>> NB: there are many other things that might fail during reconnection of
>> a qemu process that shouldn't lead to killing the qemu
>> process. qemuProcessReconnect() could use a good audit to change what
>> happens when certain items fail.
>>
> 
> Nose game, not me ;-)

Nor me. I've done my part by pointing it out :-P

> 
>>
>>
>>  src/qemu/qemu_process.c | 9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 53170d7..fcb5763 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -2825,7 +2825,7 @@ int qemuProcessStopCPUs(virQEMUDriverPtr driver,
>>  
>>  
>>  
>> -static int
>> +static void
>>  qemuProcessNotifyNets(virDomainDefPtr def)
>>  {
>>      size_t i;
>> @@ -2840,10 +2840,8 @@ qemuProcessNotifyNets(virDomainDefPtr def)
>>          if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT)
>>             ignore_value(virNetDevMacVLanReserveName(net->ifname, false));
>>  
>> -        if (networkNotifyActualDevice(def, net) < 0)
>> -            return -1;
>> +        networkNotifyActualDevice(def, net);
> 
> Seems like networkNotifyActualDevice should then return a void too?
> And have it's comments/content changed...

That depends. If we go through the entire call stack and obliterate the
return codes, then in the future if we decide there is something that we
really do want to fail on, there will be more stuff to "un-undo". Also,
I'm not as comfortable obliterating the return code of a function that
called from another file. That's partially just superstition though.

> 
> 
> IIUC thought, essentially if the network goes away, the domain isn't
> necessarily shut down - it gets notified somehow that the network isn't
> there any more

Yeah, it's "notified" mainly by the fact that it's no longer receiving
any traffic. We *might* want to think about setting the guest's
interface ~IFF_UP, but who knows what unwanted side effects that could
have in the case of someone just restarting a network.

> and soldiers on. So from that aspect I can certain agree
> that just because we discover an error when libvirtd restarts doesn't
> mean we should now kill the domain.

Yeah, it's*very* passive-agressive (nay, agressive-agressive) to kill
the entire guest just because one little part of it has stopped working.
Much nicer to leave it running so that some amount of sane shutdown
might be performed even if the non-working piece can't be restored.


> 
> I think the concept behind the patch is fine - it's the "how much more"
> should be done...  (and yes with the eye on keeping backports to a
> minimum, yep understood).

I guess I can also nuke the return value from
networkNotiyActualDevice(). I'll redo it that way and resend.

> 
> 
> John
> 
>>      }
>> -    return 0;
>>  }
>>  
>>  static int
>> @@ -3480,8 +3478,7 @@ qemuProcessReconnect(void *opaque)
>>      if (qemuSecurityReserveLabel(driver->securityManager, obj->def, obj->pid) < 0)
>>          goto error;
>>  
>> -    if (qemuProcessNotifyNets(obj->def) < 0)
>> -        goto error;
>> +    qemuProcessNotifyNets(obj->def);
>>  
>>      if (qemuProcessFiltersInstantiate(obj->def))
>>          goto error;
>>
> 




More information about the libvir-list mailing list