[Libvir] [PATCH] virDomainMigrate version 4 (for discussion only!)

Mon Jul 23 10:00:21 UTC 2007

Daniel Veillard wrote:
>> virDomainPtr
>>   virDomainMigrate (virDomainPtr domain, virConnectPtr dconn,
>>                     unsigned long flags, const char *dname,
>>                     const char *uri, unsigned long resource);
> 
>   I would probably rename resource as 'bandwidth' it's more precise.

Yes, I think you're right.

>> The caller may set dname, uri and resource to 0/NULL and forget about 
>> them.  Or else the caller may set, in particular, uri to allow for more 
>> complicated migration strategies (especially for qemu).
>>
>> https://www.redhat.com/archives/libvir-list/2007-July/msg00249.html
> 
>   Sounds fine, I would have loved a simpler API though.

Me too.  However I think that all the parameters have their place.  The 
only change I might suggest would be to bring back the list of optional 
parameters (see the first version of the API), and move things like 
flags, dname and resource^Wbandwidth there.  (The URI however is quite 
fundamental).  I don't know what you think of that.  Having lists of 
parameters is flexible but in its own way somewhat complicated.

>> As outlined in the diagram in this email:
>> https://www.redhat.com/archives/libvir-list/2007-July/msg00264.html
>> migration happens in two stages.
> 
>   To some extend that's "implementation details" compared to the API
> but I understand why two steps are needed under the hood, fine by me.
> 
>> Firstly we send a "prepare" message to the destination host.  The 
>> destination host may reply with a cookie.  It may also suggest a URI (in 
>> the current Xen implementation it just returns gethostname).  Secondly 
>> we send a "perform" message to the source host.
>>
>> Correspondingly, there are two new driver API functions:
>>
>>   typedef int
>>       (*virDrvDomainMigratePrepare)
>>                       (virConnectPtr dconn,
>>                        char **cookie,
>>                        int *cookielen,
>>                        const char *uri_in,
>>                        char **uri_out,
>>                        unsigned long flags,
>>                        const char *dname,
>>                        unsigned long resource);
>>
>>   typedef int
>>       (*virDrvDomainMigratePerform)
>>                       (virDomainPtr domain,
>>                        const char *cookie,
>>                        int cookielen,
>>                        const char *uri,
>>                        unsigned long flags,
>>                        const char *dname,
>>                        unsigned long resource);
> 
>   I wonder if 2 steps are really sufficient. I have the feeling that a third
> step virDrvDomainMigrateFinish() might be needed, it could for example 
> resume on the target side and also verify the domain is actually okay.
> That could improve error handling and feels a lot more like a transactional
> system where you really want an atomic work/fail operation and nothing else.

Yes.  It's important to note that the internals may be changed later, 
although that may complicate things if we want to allow people to run 
incompatible versions of the daemons.  [Another email on that subject is 
coming up after this one].

I'm not sure how exactly what you propose would work in the Xen case. 
In the common error case (incompatible xend leading to domains being 
eaten), the domain is actually created for a short while on the dest 
host.  It dies later, but it seems possible to me that there is a race 
where domainMigrateFinish could return "OK", and yet the domain would 
fail later.  In another case -- where the domain could not connect to 
its iSCSI backend -- this could be even more common.

Note also that there is a third step in the current code.  After 
domainMigrate has finished, the controlling client then does 
"virDomainLookupByName" in order to fetch the destination domain object. 
  This is subject to the race conditions in the preceeding paragraph.

>> There are two corresponding wire messages 
>> (REMOTE_PROC_DOMAIN_MIGRATE_PREPARE and 
>> REMOTE_PROC_DOMAIN_MIGRATE_PERFORM) but they just do dumb argument 
>> shuffling, albeit rather complicated because of the number of arguments 
>> passed in and out.
>>
>> The complete list of messages which go across the wire during a 
>> migration is:
>>
>>   client -- prepare --> destination host
>>   client <-- prepare reply -- destination host
>>   client -- perform --> source host
>>   client <-- perform reply -- source host
>>   client -- lookupbyname --> destination host
>>   client <-- lookupbyname reply -- destination host
> 
>   Okay, instead of trying to reuse lookupbyname to assert completion,
> I would rather make a third special entry point. Sounds more generic
> to me, but again it's an implementation point, not a blocker, all this
> is hidden behind the API.

Noted.

>> Xen URIs
>> --------
>>
>> Xen recognises the following forms of URI:
>>   hostname
>>   hostname:port
>>   tcp://hostname/
>>   tcp://hostname:port/
> 
>  From an URI perspective "tcp" is not proper IMHO, it should be a protocol
> name, make that xenmigr or something. 

OK.

>> Capabilities
>> ------------
>>
>> I have extended capabilities with <migration_features>.  For Xen this is:
>>
>> <capabilities>
>>   <host>
>>    <migration_features>
>>      <live/>
>>      <uri_transports>
>>        <uri_transport>tcp</uri_transport>
>>      </uri_transports>
>>    </migration_features>
> 
>   Nice, but what is the expected set of values for uri_transport ?
> Theorically that can be any scheme name from rfc2396 (or later)
>   scheme        = alpha *( alpha | digit | "+" | "-" | "." )
> 
>   unless I misunderstood this.

I think I'm misunderstanding you.  In the Xen case it would be changed 
to <uri_transport>xenmigr</uri_transport>.

> [...]
>> +/* Domain migration flags. */
>> +#define VIR_MIGRATE_LIVE 1
>> +
>> +/* Domain migration. */
>> +virDomainPtr virDomainMigrate (virDomainPtr domain, virConnectPtr dconn,
>> +			       unsigned long flags, const char *dname,
>> +			       const char *uri, unsigned long resource);
> 
>   I would rename resource to bandwidth :-)
> 
> [...]
>>  #define REMOTE_CPUMAPS_MAX 16384
>> +#define REMOTE_MIGRATE_COOKIE_MAX 256
> 
>   hum, what is that ? Sorry to show up ignorance, feel free to point
> me to an xdr FAQ !

In XDR you can either have unlimited strings (well, the limit is 
2^32-1), or you can impose a limit on their length.  Some common types 
in XDR notation:

   Unlimited strings:      string foo<>;
   Limited-length strings: string foo<1000>;
   Fixed-length strings:   char foo[1000];
   Byte arrays:            opaque foo[1000];

Now if we just defined all strings as <> type, then the 
automatically-built XDR receivers would accept unlimited amounts of data 
from a buggy or malicious client.  In particular they would attempt to 
allocate large amounts of memory, crashing the server[1].  So instead we 
impose upper limits on the length of various strings.  They are defined 
at the top of qemud/remote_protocol.x.

>> @@ -632,6 +663,8 @@ enum remote_procedure {
>>  	REMOTE_PROC_DOMAIN_GET_SCHEDULER_PARAMETERS = 57,
>>  	REMOTE_PROC_DOMAIN_SET_SCHEDULER_PARAMETERS = 58,
>>  	REMOTE_PROC_GET_HOSTNAME = 59,
>> +	REMOTE_PROC_DOMAIN_MIGRATE_PERFORM = 60,
>> +	REMOTE_PROC_DOMAIN_MIGRATE_PREPARE = 61,
>>  };
> 
>   I would like to see a distinct third step FINISH, as I think it would be
> more generic.
> 
> [...]
>> + * virDomainMigrate:
>> + * @domain: a domain object
>> + * @dconn: destination host (a connection object)
>> + * @flags: flags
>> + * @dname: (optional) rename domain to this at destination
>> + * @uri: (optional) dest hostname/URI as seen from the source host
>> + * @resource: (optional) specify resource limit in Mbps
>> + *
>> + * Migrate the domain object from its current host to the destination
>> + * host given by dconn (a connection to the destination host).
>> + *
>> + * Flags may be one of more of the following:
>> + *   VIR_MIGRATE_LIVE   Attempt a live migration.
>> + *
>> + * If a hypervisor supports renaming domains during migration,
>> + * then you may set the dname parameter to the new name (otherwise
>> + * it keeps the same name).  If this is not supported by the
>> + * hypervisor, dname must be NULL or else you will get an error.
> 
>   hum, sounds a bit drastic to me, but might be the right thing.
> In any case the capability to rename should be added to the 
> <capabilities><migration_features> probably as a <rename/> optional
> element.
> 
>> + * Since typically the two hypervisors connect directly to each
>> + * other in order to perform the migration, you may need to specify
>> + * a path from the source to the destination.  This is the purpose
>> + * of the uri parameter.  If uri is NULL, then libvirt will try to
>> + * find the best method.  Uri may specify the hostname or IP address
>> + * of the destination host as seen from the source.  Or uri may be
>> + * a URI giving transport, hostname, user, port, etc. in the usual
>> + * form.  Refer to driver documentation for the particular URIs
>> + * supported.
>> + *
>> + * The maximum bandwidth (in Mbps) that will be used to do migration
>> + * can be specified with the resource parameter.  If set to 0,
>> + * libvirt will choose a suitable default.  Some hypervisors do
>> + * not support this feature and will return an error if resource
>> + * is not 0.
> 
>   Do you really want to fail there too ?

I'm not sure ...  It seemed like the safest thing to do since we are 
unable to enforce the requested limit so Bad Things might happen 
(effectively a DoS on the network).

> Similary the capability to 
> limit bandwidth should be added to the <capabilities><migration_features>
> possibly as a <bandwidth/> optional element.

Yes.  Note that although xend takes and indeed requires a resource 
parameter, the implementation in xen 3.1 completely ignores it.  For 
this reason, <bandwidth/> is _not_ a xen 3.1 capability.

>> + * To see which features are supported by the current hypervisor,
>> + * see virConnectGetCapabilities, /capabilities/host/migration_features.
>> + *
>> + * There are many limitations on migration imposed by the underlying
>> + * technology - for example it may not be possible to migrate between
>> + * different processors even with the same architecture, or between
>> + * different types of hypervisor.
>> + *
>> + * Returns the new domain object if the migration was successful,
>> + *   or NULL in case of error.
>> + */
>> +virDomainPtr
>> +virDomainMigrate (virDomainPtr domain,
>> +                  virConnectPtr dconn,
>> +                  unsigned long flags,
>> +                  const char *dname,
>> +                  const char *uri,
>> +                  unsigned long resource)
>> +{
>> +    virConnectPtr conn;
>> +    char *uri_out = NULL;
>> +    char *cookie = NULL;
>> +    int cookielen = 0, ret;
>> +    DEBUG("domain=%p, dconn=%p, flags=%lu, dname=%s, uri=%s, resource=%lu",
>> +          domain, dconn, flags, dname, uri, resource);
>> +
>> +    if (!VIR_IS_DOMAIN (domain)) {
>> +        virLibDomainError(domain, VIR_ERR_INVALID_DOMAIN, __FUNCTION__);
>> +        return NULL;
>> +    }
>> +    conn = domain->conn;        /* Source connection. */
>> +    if (!VIR_IS_CONNECT (dconn)) {
>> +        virLibConnError (conn, VIR_ERR_INVALID_CONN, __FUNCTION__);
>> +        return NULL;
>> +    }
>> +
>> +    /* Check that migration is supported.
>> +     * Note that in the remote case these functions always exist.
>> +     * But the remote driver will throw NO_SUPPORT errors later.
>> +     */
>> +    if (!conn->driver->domainMigratePerform ||
>> +        !dconn->driver->domainMigratePrepare) {
>> +        virLibConnError (conn, VIR_ERR_NO_SUPPORT, __FUNCTION__);
>> +        return NULL;
>> +    }
>> +
>> +    /* Prepare the migration.
>> +     *
>> +     * The destination host may return a cookie, or leave cookie as
>> +     * NULL.
>> +     *
>> +     * The destination host MUST set uri_out if uri_in is NULL.
>> +     *
>> +     * If uri_in is non-NULL, then the destination host may modify
>> +     * the URI by setting uri_out.  If it does not wish to modify
>> +     * the URI, it should leave uri_out as NULL.
>> +     */
>> +    ret = dconn->driver->domainMigratePrepare
>> +        (dconn, &cookie, &cookielen, uri, &uri_out, flags, dname, resource);
>> +    if (ret == -1) return NULL;
>> +    if (uri == NULL && uri_out == NULL) {
>> +        virLibConnError (conn, VIR_ERR_INTERNAL_ERROR,
>> +                         "domainMigratePrepare did not set uri");
>> +        if (cookie) free (cookie);
>> +        return NULL;
>> +    }
>> +    if (uri_out) uri = uri_out; /* Did domainMigratePrepare change URI? */
>> +
>> +    assert (uri != NULL);
>> +
>> +    /* Perform the migration.  The driver isn't supposed to return
>> +     * until the migration is complete.
>> +     */
>> +    ret = conn->driver->domainMigratePerform
>> +        (domain, cookie, cookielen, uri, flags, dname, resource);
>> +
>> +    if (uri_out) free (uri_out);
>> +    if (cookie) free (cookie);
>> +
>> +    if (ret == -1) return NULL;
>> +
>> +    /* Get the destination domain and return it or error. */
>> +    return virDomainLookupByName (dconn, dname ? dname : domain->name);
>> +}
> 
>   Looks fine, but a finish operation returning a virDomainPtr may be
> better. It would carry the cookie and names to identify the migration.
>   Crazy question, what happens if the remote task gets migrated away again
> or just killed before the lookup :-) , I'm not sure we can come up with
> an atomic operation in the general case, but that would be nice to allow
> it in the design if the hypervisor makes it possible.
> 
> [...]
>> --- src/xen_internal.c	12 Jul 2007 08:57:52 -0000	1.85
>> +++ src/xen_internal.c	18 Jul 2007 19:00:20 -0000
>> @@ -2173,6 +2173,12 @@ xenHypervisorMakeCapabilitiesXML(virConn
>>                        "\
>>        </features>\n\
>>      </cpu>\n\
>> +    <migration_features>\n\
>> +      <live/>\n\
> plus
>          <bandwidth/>
>          <rename/>
> 
>> +      <uri_transports>\n\
>> +        <uri_transport>tcp</uri_transport>\n\
>> +      </uri_transports>\n\
>> +    </migration_features>\n\
>>    </host>\n", -1);
>>      if (r == -1) goto vir_buffer_failed;
> [...]
>> +    /* Xen (at least up to 3.1.0) takes a resource parameter but
>> +     * ignores it.
>> +     */
>> +    if (resource) {
>> +        virXendError (conn, VIR_ERR_NO_SUPPORT,
>> +                      "xenDaemonDomainMigrate: Xen does not support resource limits during migration");
>> +        return -1;
>> +    }
> 
>   Hum, strange, the classical Xen bandwidth during live migration of http
> picture uses a 100Mbps capped transfer, so that ought to work at least
> at some point in the past, weird.
> 
> [...]
>> +    /* Set hostname and port.
>> +     *
>> +     * URI is non-NULL (guaranteed by caller).  We expect either
>> +     * "hostname", "hostname:port" or "tcp://hostname[:port]/".
>> +     */
>> +    if (strstr (uri, "//")) {   /* Full URI. */
>> +        xmlURIPtr uriptr = xmlParseURI (uri);
>> +        if (!uriptr) {
>> +            virXendError (conn, VIR_ERR_INVALID_ARG,
>> +                          "xenDaemonDomainMigrate: invalid URI");
>> +            return -1;
>> +        }
>> +        if (uriptr->scheme && STRCASENEQ (uriptr->scheme, "tcp")) {
>> +            virXendError (conn, VIR_ERR_INVALID_ARG,
>> +                          "xenDaemonDomainMigrate: only tcp:// migrations are supported by Xen");
>> +            xmlFreeURI (uriptr);
>> +            return -1;
>> +        }
>> +        if (!uriptr->server) {
>> +            virXendError (conn, VIR_ERR_INVALID_ARG,
>> +                          "xenDaemonDomainMigrate: a hostname must be specified in the URI");
>> +            xmlFreeURI (uriptr);
>> +            return -1;
>> +        }
>> +        hostname = strdup (uriptr->server);
>> +        if (!hostname) {
>> +            virXendError (conn, VIR_ERR_NO_MEMORY, "strdup");
>> +            xmlFreeURI (uriptr);
>> +            return -1;
>> +        }
>> +        if (uriptr->port)
>> +            snprintf (port, sizeof port, "%d", uriptr->port);
>> +        xmlFreeURI (uriptr);
>> +    }
>> +    else if ((p = strrchr (uri, ':')) != NULL) { /* "hostname:port" */
>> +        int port_nr, n;
>> +
>> +        if (sscanf (p+1, "%d", &port_nr) != 1) {
>> +            virXendError (conn, VIR_ERR_INVALID_ARG,
>> +                          "xenDaemonDomainMigrate: invalid port number");
>> +            return -1;
>> +        }
>> +        snprintf (port, sizeof port, "%d", port_nr);
>> +
>> +        /* Get the hostname. */
>> +        n = p - uri; /* n = Length of hostname in bytes. */
>> +        hostname = strdup (uri);
>> +        if (!hostname) {
>> +            virXendError (conn, VIR_ERR_NO_MEMORY, "strdup");
>> +            return -1;
>> +        }
>> +        hostname[n] = '\0';
>> +    }
>> +    else {                      /* "hostname" (or IP address) */
>> +        hostname = strdup (uri);
>> +        if (!hostname) {
>> +            virXendError (conn, VIR_ERR_NO_MEMORY, "strdup");
>> +            return -1;
>> +        }
>> +    }
> 
>   Hum, we say it's an uri, but we interpret is differently if it's not absolute
> this could lead to confusion. But I'm not sure being a purist here would help
> that much from an user POV.
> 
>   In general this looks really good to me. I'm just worried about the 3rd
> part of the operation, I don't think it would be that hard to add, but 
> would IMHO make the protocol at the network level more future-proof,
> if we could avoid breaking it in the future that woud be nice to put
> in now. But that's not a blocker to me.

Rich.

[1] Actually there are various other measures to prevent this from 
happening, including a maximum total message size that we will accept, 
but defense in depth ...