[libvirt] [PATCH 1/3] Introduce new APIs for spawning processes

Wed May 26 14:53:27 UTC 2010

On Tue, May 25, 2010 at 03:04:58PM -0600, Eric Blake wrote:
> On 05/25/2010 07:24 AM, Daniel P. Berrange wrote:
> > This introduces a new set of APIs in src/util/command.h
> > to use for invoking commands. This is intended to replace
> > all current usage of virRun and virExec variants, with a
> > more flexible and less error prone API.
> > ---
> >  src/Makefile.am              |    1 +
> >  src/util/command.c           |  782 ++++++++++++++++++++++++++++++++++++++++++
> >  src/util/command.h           |  197 +++++++++++
> >  tests/Makefile.am            |   14 +-
> >  tests/commanddata/test1.log  |   12 +
> >  tests/commanddata/test10.log |   14 +
> >  tests/commanddata/test11.log |   12 +
> >  tests/commanddata/test12.log |   12 +
> >  tests/commanddata/test13.log |   12 +
> >  tests/commanddata/test2.log  |   14 +
> >  tests/commanddata/test3.log  |   12 +
> >  tests/commanddata/test4.log  |   10 +
> >  tests/commanddata/test5.log  |    6 +
> >  tests/commanddata/test6.log  |   11 +
> >  tests/commanddata/test7.log  |    7 +
> >  tests/commanddata/test8.log  |   14 +
> >  tests/commanddata/test9.log  |   14 +
> >  tests/commandhelper.c        |  112 ++++++
> >  tests/commandtest.c          |  494 ++++++++++++++++++++++++++
> 
> Nice to see as much test additions as the code itself.
> 
> > +
> > +struct _virCommand {
> > +    int has_error;
> 
> s/int/bool/?
> 
> > +
> > +    int nargs;
> 
> s/int/size_t/
> 
> > +    char **args;
> > +
> > +    int nenv;
> 
> s/int/size_t/
> 
> > +    char **env;
> > +
> > +    fd_set preserve;
> 
> This implicitly limits us to FD_SETSIZE fd's (aka 32 on some platforms);
> is that acceptable, or should we be using a better structure?

For now this is sufficient. Once the work is complete though,
we can change the virExec() signature to accept a int[] instead
and avoid the limit.

> > +    int outfd;
> > +    int errfd;
> > +    int *outfdptr;
> > +    int *errfdptr;
> > +
> > +    virExecHook hook;
> > +    void *opaque;
> > +
> > +    pid_t pid;
> > +    char *pidfile;
> 
> Do we want to allow the parent process to control which directory the
> child will start in (defaulting to the parent's cwd)?

Yes, that's probably useful. Currently child retains current cwd,
unless the daemonize option is turned on, in which case it gets
set to /

> > +/*
> > + * Re-allow a specific capability
> > + */
> > +void virCommandAllowCap(virCommandPtr cmd,
> > +                        int capability ATTRIBUTE_UNUSED)
> > +{
> > +    if (!cmd || cmd->has_error)
> > +        return;
> > +
> > +    /* XXX ? */
> > +}
> 
> Do we have a use case for this yet, or should we just #if 0 this section
> until there is a need and a reasonable implementation?

Yeah, i'll disable it for now.

> > +void virCommandAddEnvPair(virCommandPtr cmd,
> > +                          const char *name,
> > +                          const char *value)
> > +{
> > +    char *env;
> > +
> > +    if (!cmd || cmd->has_error)
> > +        return;
> > +
> > +    if (virAsprintf(&env, "%s=%s", name, value ? value : "") < 0) {
> > +        cmd->has_error = 1;
> > +        return;
> > +    }
> > +
> > +    if (VIR_REALLOC_N(cmd->env, cmd->nenv + 2) < 0) {
> 
> This scales quadratically, and can lead to noticeable performance
> degredation if there are lots of calls to virCommandAddEnvPair.  Better
> would be to amortize the allocation costs by tracking the allocation
> size of cmd->env (in addition to cmd->nenv), and geometrically enlarging
> it as-needed (by a factor of 1.5x or 2x), rather than a reallocation on
> every addition.

Ok, I'll keep track of allocation separately from usage.

> > +
> > +/*
> > + * Add a NULL terminated list of args
> > + */
> > +void virCommandAddArgSet(virCommandPtr cmd,
> > +                         const char *const*vals)
> 
> Is it worth a varargs variant of this:
> 
> virCommandAddArgList(cmd, "a", "b", NULL)?

Yes, it quite possibly is worthwhile.

> > +/*
> > + * Feed the child's stdin from a string buffer
> > + */
> > +void virCommandSetInputBuffer(virCommandPtr cmd,
> > +                              const char *inbuf)
> > +{
> > +    if (!cmd || cmd->has_error)
> > +        return;
> > +
> > +    cmd->inbuf = inbuf;
> > +    cmd->infd = -1;
> > +    cmd->inpipe = -1;
> 
> Should we set has_error if the user calls both virCommandSetInputBuffer
> and virCommandSetInputFD?  (Detectable if infd or inpipe are not -1
> here; likewise detectable in virCommandSetInputFD if input is not NULL).

Yes, might as well.

> > +static int
> > +virCommandProcessIO(virCommandPtr cmd)
> 
> Documentation?
> 
> > +{
> > +    int infd = -1, outfd = -1, errfd = -1;
> > +    size_t inlen = 0, outlen = 0, errlen = 0;
> > +    size_t inoff = 0;
> > +
> > +    /* With an input buffer, feed data to child
> > +     * via pipe */
> > +    if (cmd->inbuf) {
> > +        inlen = strlen(cmd->inbuf);
> > +        infd = cmd->inpipe;
> > +    }
> > +
> > +    /* With out/err buffer, we're the outfd/errfd
> 
> s/we're //
> 
> > +     * have been filled with an FD for us */
> > +    if (cmd->outbuf)
> > +        outfd = cmd->outfd;
> > +    if (cmd->errbuf)
> > +        errfd = cmd->errfd;
> > +
> > +    for (;;) {
> > +        int i;
> > +        struct pollfd fds[3];
> > +        int nfds = 0;
> > +
> > +        if (infd != -1) {
> > +            fds[nfds].fd = infd;
> > +            fds[nfds].events = POLLOUT;
> > +            nfds++;
> > +        }
> > +        if (outfd != -1) {
> > +            fds[nfds].fd = outfd;
> > +            fds[nfds].events = POLLIN;
> > +            nfds++;
> > +        }
> > +        if (errfd != -1) {
> > +            fds[nfds].fd = errfd;
> > +            fds[nfds].events = POLLIN;
> > +            nfds++;
> > +        }
> 
> Should this initialization be floated outside of the for(;;) loop?

No, because infd, outfd and errfd may be set to -1 on
any iteration of the loop.

> 
> > +
> > +        if (nfds == 0)
> > +            break;
> > +
> > +        if (poll(fds,nfds, -1) < 0) {
> > +            if ((errno == EAGAIN) || (errno == EINTR))
> > +                continue;
> > +            virReportSystemError(errno, "%s",
> > +                                 _("unable to poll on child"));
> > +            return -1;
> > +        }
> > +
> > +        for (i = 0; i < nfds ; i++) {
> > +            if (fds[i].fd == errfd ||
> > +                fds[i].fd == outfd) {
> > +                char data[1024];
> > +                char **buf;
> > +                size_t *len;
> > +                int done;
> > +                if (fds[i].fd == outfd) {
> > +                    buf = cmd->outbuf;
> > +                    len = &outlen;
> > +                } else {
> > +                    buf = cmd->errbuf;
> > +                    len = &errlen;
> > +                }
> > +
> > +                done = read(fds[i].fd, data, sizeof(data));
> > +                if (done < 0) {
> > +                    if (errno != EINTR &&
> > +                        errno != EAGAIN) {
> > +                        virReportSystemError(errno, "%s",
> > +                                             _("unable to write to child input"));
> > +                        return -1;
> > +                    }
> > +                } else if (done == 0) {
> > +                    if (fds[i].fd == outfd)
> > +                        outfd = -1;
> > +                    else
> > +                        errfd = -1;
> > +                } else {
> > +                    if (VIR_REALLOC_N(*buf, *len + done + 1) < 0) {
> > +                        virReportOOMError();
> > +                        return -1;
> > +                    }
> > +                    memmove(*buf + *len, data, done);
> 
> s/memmove/memcpy/ - there's no overlap between data and buf
> 
> > +                    *len += done;
> > +                    (*buf)[*len] = '\0';
> > +                }
> > +            } else {
> > +                int done;
> > +
> > +                done = write(infd, cmd->inbuf + inoff,
> > +                             inlen - inoff);
> > +                if (done < 0) {
> > +                    if (errno != EINTR &&
> > +                        errno != EAGAIN) {
> > +                        virReportSystemError(errno, "%s",
> > +                                             _("unable to write to child input"));
> > +                        return -1;
> > +                    }
> > +                } else {
> > +                    inoff += done;
> > +                    if (inoff == inlen) {
> > +                        close(infd);
> > +                        infd = -1;
> > +                    }
> > +                }
> > +            }
> > +
> > +        }
> > +    }
> > +
> > +    return 0;
> > +
> > +}

> > +/*
> > + * Run the command asynchronously
> > + * Returns -1 on any error executing the
> > + * command. Returns 0 if the command executed.
> > + */
> > +int virCommandRunAsync(virCommandPtr cmd,
> > +                       pid_t *pid)
> > +{
> > +    int ret;
> > +    char *str;
> > +
> > +    if (!cmd || cmd->has_error) {
> > +        virReportOOMError();
> > +        return -1;
> > +    }
> > +
> > +    if (cmd->pid != -1) {
> > +        virCommandError(VIR_ERR_INTERNAL_ERROR,
> > +                        _("command is already running as pid %d"),
> > +                        cmd->pid);
> > +        return -1;
> > +    }
> > +
> > +    str = virArgvToString((const char *const *)cmd->args);
> > +    VIR_DEBUG("About to run %s", str ? str : cmd->args[0]);
> > +    VIR_FREE(str);
> > +
> > +    ret = virExecWithHook((const char *const *)cmd->args,
> > +                          (const char *const *)cmd->env,
> > +                          &cmd->preserve,
> > +                          &cmd->pid,
> > +                          cmd->infd,
> > +                          cmd->outfdptr,
> > +                          cmd->errfdptr,
> > +                          cmd->flags,
> > +                          cmd->hook,
> > +                          cmd->opaque,
> > +                          cmd->pidfile);
> 
> Should virExecWithHook be moved into this new file?

Yes, once all callers of virRun/virExec are ported to the
new API, we'll move it and make it static.

> > +int main(int argc, char **argv) {
> > +    int i, n;
> > +    char **origenv;
> > +    char **newenv;
> > +    FILE *log = fopen(abs_builddir "/commandhelper.log", "w");
> > +
> > +    if (!log)
> > +        goto error;
> > +
> > +    for (i = 1 ; i < argc ; i++) {
> > +        fprintf(log, "ARG:%s\n", argv[i]);
> > +    }
> > +
> > +    origenv = environ;
> > +    n = 0;
> > +    while (*origenv != NULL) {
> > +        n++;
> > +        origenv++;
> > +    }
> > +
> > +    if (VIR_ALLOC_N(newenv, n) < 0) {
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    origenv = environ;
> > +    n = i = 0;
> > +    while (*origenv != NULL) {
> > +        newenv[i++] = *origenv;
> > +        n++;
> 
> Why set i = 0, and increment both i and n, when you could have just done
> newenv[n++] = *origenv
> 
> > +        origenv++;
> > +    }
> > +    qsort(newenv, n, sizeof(newenv[0]), envsort);
> > +
> > +    for (i = 0 ; i < n ; i++) {
> > +        fprintf(log, "ENV:%s\n", newenv[i]);
> > +    }
> > +
> > +    for (i = 0 ; i < sysconf(_SC_OPEN_MAX) ; i++) {
> 
> Will this even compile on mingw?  Gnulib provides getdtablesize() (oops,
> it's still LGPLv3) which portably lets you determine a nice maximum
> bound on the number of open files, even when sysconf() is not available.

Probably not. For now all this code should be disabled on mingw,
since there is no fork+exec  there anyway.  

Since this code hides the while fork+exec pairing though, we might
be able to add an impl that uses the native windows process spawning
APIs (albeit with possible functional limitations).

> > diff --git a/tests/commandtest.c b/tests/commandtest.c
> > new file mode 100644
> > index 0000000..c66d345
> > --- /dev/null
> > +++ b/tests/commandtest.c
> > @@ -0,0 +1,494 @@
> > +#include <config.h>
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <signal.h>
> > +
> > +#include "testutils.h"
> > +#include "internal.h"
> > +#include "nodeinfo.h"
> > +#include "util.h"
> > +#include "memory.h"
> > +#include "command.h"
> > +
> > +#ifndef __linux__
> 
> Is this test really linux-specific?  It seems like other capable
> platforms, like cygwin, might be able to run it just fine.

We've only really supported mingw before, which lacks
fork+exec. Guess we should explicitly try to detect
availability of fork+exec

> > +static int
> > +mymain(int argc, char **argv)
> > +{
> > +    int ret = 0;
> > +    char cwd[PATH_MAX];
> > +
> > +    abs_srcdir = getenv("abs_srcdir");
> > +    if (!abs_srcdir)
> > +        abs_srcdir = getcwd(cwd, sizeof(cwd));
> > +
> > +    progname = argv[0];
> > +
> > +    if (argc > 1) {
> > +        fprintf(stderr, "Usage: %s\n", progname);
> > +        return(EXIT_FAILURE);
> > +    }
> > +
> > +    if (chdir("/tmp") < 0)
> > +        return(EXIT_FAILURE);
> 
> Do we really want to risk spurious test failures by running directly in
> /tmp, or should we be creating a secure subdirectory?

The problem is that the expected output datafiles have the
directory hardcoded in them. I could make a sub-dir, and then
do a pattern replacement on the datafiles after loading them.

Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|