[libvirt] [PATCH v2 09/12] qemu: Start PR daemons on domain startup

Tue Mar 6 17:31:27 UTC 2018

On 03/02/2018 04:54 PM, John Ferlan wrote:
> 
> 
> On 02/21/2018 01:11 PM, Michal Privoznik wrote:
>> Before we exec() qemu we have to spawn pr-helper processes for
>> all managed reservations (well, technically there can only one).
> 
> "can be only one"
> 
> Since there can only be one why do we bother w/ plurality?

Yeah, this is leftover from v1 where the code was prepared for multiple
managed helpers per one domain.

> 
>> The only caveat there is that we should place the process into
>> the same namespace and cgroup as qemu (so that it shares the same
>> view of the system). But we can do that only after we've forked.
>> That means calling the setup function between fork() and exec().
> 
> Seems like a good note/comment in the code where
> qemuProcessSetupPRDaemons is called so that someone doesn't have to go
> back to the commit message in order to find out why the call was placed
> where it was.

Okay, I will add it. But we need some rules on this because sometimes I
add a comment and reviewer says putting the info in the commit message
is enough. Or vice versa.

> 
>>
>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>> ---
>>  src/qemu/qemu_process.c | 164 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 164 insertions(+)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index dc33eb7bc..33e0ad30c 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -2551,6 +2551,164 @@ qemuProcessResctrlCreate(virQEMUDriverPtr driver,
>>      ret = 0;
>>   cleanup:
>>      virObjectUnref(caps);
>> +
>> +    return ret;
>> +}
>> +
>> +
>> +static void
>> +qemuProcessKillPRDaemons(virDomainObjPtr vm)
> 
> Daemon*s* is is a misnomer
> 
>> +{
>> +    qemuDomainObjPrivatePtr priv = vm->privateData;
>> +
>> +    if (priv->prPid == (pid_t) -1)
>> +        return;
>> +
>> +    virProcessKillPainfully(priv->prPid, true);
>> +    priv->prPid = (pid_t) -1;
>> +}
>> +
>> +
>> +static int
>> +qemuProcessSetupOnePRDaemonHook(void *opaque)
>> +{
>> +    virDomainObjPtr vm = opaque;
>> +    size_t i, nfds = 0;
>> +    int *fds = NULL;
>> +    int ret = -1;
>> +
>> +    if (virProcessGetNamespaces(vm->pid, &nfds, &fds) < 0)
>> +        return ret;
> 
> <sigh> another false positive for Coverity since it for some reason
> believes "virProcessGetNamespaces" allocates memory that is stored into
> "fds" when there's an error returned.
> 
> Strangely, setting *nfdlist = 0 in the (ret < 0) part of cleanup: in
> virProcessGetNamespaces and going to cleanup here magically clears the
> error...
> 
>> +
>> +    if (nfds > 0 &&
>> +        virProcessSetNamespaces(nfds, fds) < 0)
>> +        goto cleanup;
>> +
>> +    ret = 0;
>> + cleanup:
>> +    for (i = 0; i < nfds; i++)
>> +        VIR_FORCE_CLOSE(fds[i]);
>> +    VIR_FREE(fds);
>> +    return ret;
>> +}
>> +
>> +
> 
> If we returned:
> 
>  -1 error, 0 started, and 1 unnecessary, then our caller could be a bit
> smarter about needing to run through the whole ndisks list.

This might help if there're thousands of disks, but I guess since we're
far from that it's unnecessary IMO. But I can change that if you want me
to :-)
> 
>> +static int
>> +qemuProcessSetupOnePRDaemon(virDomainObjPtr vm,
>> +                            virDomainDiskDefPtr disk)
>> +{
>> +    qemuDomainObjPrivatePtr priv = vm->privateData;
>> +    virQEMUDriverPtr driver = priv->driver;
>> +    virQEMUDriverConfigPtr cfg;
>> +    qemuDomainStorageSourcePrivatePtr srcPriv;
>> +    qemuDomainDiskPRDPtr prd;
>> +    char *pidfile = NULL;
>> +    pid_t cpid = -1;
>> +    virCommandPtr cmd = NULL;
>> +    virTimeBackOffVar timebackoff;
>> +    const unsigned long long timeout = 500000; /* ms */
>> +    int ret = -1;
>> +
>> +    if (priv->prPid != (pid_t) -1 ||
>> +        !virStoragePRDefIsManaged(disk->src->pr))
>> +        return 0;
> 
> Ah... so this is where we ensure there is only ever one pr-helper... and
> this !managed usage still doesn't help my earlier confusion.
> 
> In one mode we have:
> 
>   -object pr-manager-helper,id=pr-helper-sdb,\
>           path=/path/to/qemu-pr-helper.sock
> 
> and the other we have:
> 
>   -object pr-manager-helper,id=pr-helper0,\
>           path=/tmp/lib/domain--1-QEMUGuest1/pr-helper0.sock
> 
> But for "everything" (or both) we still have the qemu-pr-helper process.

Question is, who starts the process. Let's forget about PR for a while
and take a look on <hostdev/>-s. The also have @managed attribute.
Question there is who detaches the hostdev on domain startup. The
command line is generated the same in both cases. But if @managed =
'yes' then libvirt takes the extra step and detaches the hostdev from
host. Otherwise it relies on user (which is equal to admin for the sake
of this conversation) that it detached the hostdev themselves.

> For both the alias/id is added to the -drive command line using
> "file.pr-manager=$alias".

Sure. We have to.

> 
> In one mode we would start the qemu-pr-helper, but in the other we're
> not. So, if none of the lun's used the second mode, then what do they
> connect to?

To socket that's created by user spawned qemu-pr-helper. Alternatively
there were some discussion whether systemd's socket activation should be
implemented for qemu-pr-helper. In that case, it's going to be systemd
who starts the process once qemu tries to connect.

> 
> Furthermore, the managed mode helps decide which alias to use and of
> course which socket will be used.  With the alias, I had the earlier
> question/confusion over how many objects would be created using the
> "other" mode in the above examples since "theoretically speaking" of
> course the id= should be unique.  Then the path is just a client path to
> the socket which in the former mode is user defined and the latter mode
> is static or the same for every lun using that mode. So can we have more
> than one lun using the same client socket path?

Theoretically we can. But why bother? We can try to find unique socket
paths and add them just once. But then we have to:

a) allocate an unique alias for pr-manager-helper,
b) track use count on hot(un-)plug.

IMO it only complicates things. But sure, in the long run we can switch
to that approach.

> 
> I don't know if the above helps explain my confusion or makes it even
> harder to understand. I hope it helps.
> 
> 
>> +
>> +    cfg = virQEMUDriverGetConfig(driver);
>> +    srcPriv = QEMU_DOMAIN_STORAGE_SOURCE_PRIVATE(disk->src);
>> +    prd = srcPriv->prd;
>> +
>> +    if (!virFileIsExecutable(cfg->prHelperName)) {
>> +        virReportSystemError(errno, _("'%s' is not a suitable pr helper"),
>> +                             cfg->prHelperName);
>> +        goto cleanup;
>> +    }
>> +
>> +    if (!(pidfile = virPidFileBuildPath(cfg->stateDir, prd->alias)))
>> +        goto cleanup;
>> +
>> +    if (unlink(pidfile) < 0 &&
>> +        errno != ENOENT) {
>> +        virReportSystemError(errno,
>> +                             _("Cannot remove stale PID file %s"),
>> +                             pidfile);
>> +        goto cleanup;
>> +    }
>> +
>> +    if (!(cmd = virCommandNewArgList(cfg->prHelperName,
>> +                                     "-k", prd->path,
>> +                                     NULL)))
>> +        goto cleanup;
>> +
>> +    virCommandDaemonize(cmd);
>> +    virCommandSetPidFile(cmd, pidfile);
>> +    virCommandSetPreExecHook(cmd, qemuProcessSetupOnePRDaemonHook, vm);
>> +
>> +    if (virCommandRun(cmd, NULL) < 0)
>> +        goto cleanup;
>> +
>> +    if (virPidFileReadPath(pidfile, &cpid) < 0) {
>> +        virReportError(VIR_ERR_INTERNAL_ERROR,
>> +                       _("pr helper %s didn't show up"), prd->alias);
>> +        goto cleanup;
>> +    }
>> +
>> +    if (virTimeBackOffStart(&timebackoff, 1, timeout) < 0)
>> +        goto cleanup;
>> +    while (virTimeBackOffWait(&timebackoff)) {
>> +        if (virFileExists(prd->path))
>> +            break;
>> +
>> +        if (virProcessKill(cpid, 0) == 0)
>> +            continue;
>> +
>> +        virReportSystemError(errno,
>> +                             _("pr helper %s died unexpectedly"), prd->alias);
>> +        goto cleanup;
>> +    }
>> +
>> +    if (priv->cgroup &&
>> +        virCgroupAddMachineTask(priv->cgroup, cpid) < 0)
>> +        goto cleanup;
>> +
>> +    if (qemuSecurityDomainSetPathLabel(driver->securityManager,
>> +                                       vm->def, prd->path, true) < 0)
>> +        goto cleanup;
>> +
>> +    priv->prPid = cpid;
>> +    ret = 0;
>> + cleanup:
>> +    if (ret < 0) {
>> +        virCommandAbort(cmd);
>> +        virProcessKillPainfully(cpid, true);
>> +    }
>> +    virCommandFree(cmd);
>> +    if (unlink(pidfile) < 0 &&
> 
> I thought I pointed this out in the previous series, but Coverity gets
> grumpy here since @pidfile could be NULL if we jump to cleanup before
> it's created when virFileIsExecutable of prHelperName fails.

Ah okay.

Michal