[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] RFC [1/3]: The internal lock manager API



  Looks fine on principles, but there is some polishing to be provided,
I especially think that we need to carefully examine all failure
scenarios and document the expected behaviour as part of the API.
After all it's an API to improve garantee of correct execution so we
need to be fairly precise there.

On Fri, Sep 10, 2010 at 05:00:07PM +0100, Daniel P. Berrange wrote:
> 
> /**
>  * virLockManager:
>  *
>  * A lock manager is a process that will supervise another
>  * process. It will obtain & hold locks/leases for resources
>  * the supervised process uses
>  */
> 
> typedef struct virLockManager virLockManager;
> typedef virLockManager *virLockManagerPtr;
> 
> typedef enum {
>     VIR_LOCK_MANAGER_RESOURCE_READONLY = (1 << 0),
>     VIR_LOCK_MANAGER_RESOURCE_SHARED   = (1 << 1),
> } virLockManagerResourceFlags;
> 
> enum {
>     VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK = 0,
> } virLockManagerResourceType;
> 
> enum {
>     VIR_LOCK_MANAGER_NEW_DOMAIN = 0,
> } virLockManagerNewType;
> 
> enum {
>     VIR_LOCK_MANAGER_NEW_MIGRATE = (1 << 0),
>     VIR_LOCK_MANAGER_NEW_ATTACH  = (1 << 1),
> } virLockManagerNewFlags;
> 
> enum {
>     VIR_LOCK_MANAGER_MIGRATE_CANCEL = (1 << 0);
> } virLockManagerCompleteMigrateFlags;
> 
> 
> /**
>  * virLockManagerNew:
>  * @driver: the driver implementation to use
>  * @type: the type of process to be supervised
>  * @flags: one of virLockManagerNewFlags
>  *
>  * Create a new context to supervise a process, usually
>  * a virtual machine. For a normal startup, flags can
>  * be 0. For incoming migration, use VIR_LOCK_MANAGER_NEW_MIGRATE
>  *
>  * Returns a new lock manager context

 or NULL in case of failure

>  */
> virLockManagerPtr virLockManagerNew(virLockManagerDriverPtr driver,
>                                     unsigned int type,
>                                     unsigned int flags);

  What do you envision as being other type use case than process used for
running a domain ?

> /**
>  * virLockManagerSetParameter:
>  * @manager: the lock manager context
>  * @key: the parameter name
>  * @value: the parameter value
>  *
>  * Set a configuration parameter for the managed process.
>  * A process of VIR_LOCK_MANAGER_START_DOMAIN will be
>  * given at least 3 parameters:
>  *
>  * - id: the domain unique id
>  * - uuid: the domain uuid
>  * - name: the domain name
>  *
>  * There may be other parameters specific to the lock manager
>  * plugin that are provided for the managed process
>  *
>  * This should only be called prior to the supervised process
>  * being started. Setting parameters may change the set of
>  * argv returned by virLockManagerGetSupervisorArgs.
>  */
> int virLockManagerSetParameter(virLockManagerPtr manager,
>                                const char *key,
>                                const char *value);
> 

  Hum ... this API feels a bit weak, I guess by the attempt of being
flexible. I think we should define precisely the strings to be passed
as key I because being an internal API I don't see any point in keeping
floating keys values.
If we know the name of the keys used by the drivers, the drivers being
internal, then I would prefer use an enum for key.

 The error handling need to be precised, I think there is 3 cases:
   - success
   - an hard error, like an allocation error
   - the lock manager doesn't know/use the key
we could explicitely merge the last one with success, but this should
be described, to be sure it's the equivalent of a no-op  then.


> /**
>  * virLockManagerRegisterResource:
>  * @manager: the lock manager context
>  * @type: the resource type virLockManagerResourceType
>  * @name: the resource name
>  * @flags: the resource access flags
>  *
>  * Register a resource that is required when the process
>  * starts up. This may only be called prior to the process
>  * being started. The format of @name varies according to
>  * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK
>  * will have a fully qualified file path.
>  *
>  * If no flags are given, the resource is assumed to be
>  * used in exclusive, read-write mode. Access can be
>  * relaxed to readonly, or shared read-write.
>  */
> int virLockManagerRegisterResource(virLockManagerPtr manager,
>                                    unsigned int type,
>                                    const gchar *name,

  why gchar here instead of char ?

>                                    unsigned int flags);

  I think failure mode need to be made clear here. Basically an error
should mean that the domain creation aborts, and it's basically an
internal error, since we don't expect the lock manager to take the
lock just now (that should be precised,  would the lock manager be
allowed to try to grab the resource immediately ?)
  Error could come from parameter being invalid (an unrecognized flag
for example), an internal allocation error, or calling at the wrong
time. It think in all cases that mean we should abort creation.

> /**
>  * virLockManagerAcquireResource:
>  * @manager: the lock manager context
>  * @type: the resource type virLockManagerResourceType
>  * @name: the resource name
>  * @flags: the resource access flags
>  *
>  * Dynamically acquire a resource for a running process.
>  * This may only be called once the process has been
>  * started. The format of @name varies according to
>  * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK
>  * will have a fully qualified file path.
>  *
>  * If no flags are given, the resource is assumed to be
>  * used in exclusive, read-write mode. Access can be
>  * relaxed to readonly, or shared read-write.
>  */
> int virLockManagerAcquireResource(virLockManagerPtr manager,
>                                   const gchar *file,

  why gchar here instead of char ?

>                                   unsigned int flags);

 Again failure mode need to be clarified. I see 4 kind of errors
possible:
  - parameter error
  - internal error (allocation for example)
  - operation being unsupported by the lock manager
  - failure to acquire the resource

  It think in all cases from a domain POV it's equivalent to a no-op
and hence non-fatal for the domain, but this should be clarified.

> /**
>  * virLockManagerReleaseResource:
>  * @manager: the lock manager context
>  * @type: the resource type virLockManagerResourceType
>  * @name: the resource name
>  * @flags: the resource access flags
>  *
>  * Dynamically release a resource for a running process.
>  * This may only be called after the process has been
>  * started. The format of @name varies according to
>  * the resource @type. A VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK
>  * will have a fully qualified file path.
>  *
>  * If no flags are given, the resource is assumed to be
>  * used in exclusive, read-write mode. Access can be
>  * relaxed to readonly, or shared read-write.
>  */
> int virLockManagerReleaseResource(virLockManagerPtr manager,
>                                   const gchar *file,

  why gchar here instead of char ?

>                                   unsigned int flags);
> 
> /**
>  * virLockManager:

  GetSupervisorPath

>  * @manager: the lock manager context
>  *
>  * Retrieve the path of the supervisor binary
>  */
> char *virLockManagerGetSupervisorPath(virLockManagerPtr manager);

  what if the lock manager is in-libvirtd ? Return NULL ?
We have a possibility of memory allocation error so that's another
failure possible.

> /**
>  * virLockManager:
>  * @manager: the lock manager context
>  * 
>  * Retrieve the security context of the supervisor binary
>  */
> char *virLockManagerGetSupervisorContext(virLockManagerPtr manager);
> 
> 
> /**
>  * virLockManager:
>  * @manager: the lock manager context
>  * @argv: returned argument values
>  * @argc: returned argument count

   * @flags: unused yet use 0

>  *
>  * Retrieve custom argv to pass to the supervisor binary
>  * ahead of the normal process binary & argv. It is recommended
>  * that the last argv be '--' to allow reliable determination
>  * of the last supervisor arg.
>  */
> int virLockManagerGetSupervisorArgs(virLockManagerPtr manager,
>                                     char ***argv,
>                                     int *argc,
>                                     unsigned int flags);
> 
> /**
>  * virLockManagerPrepareMigrate:
>  * @manager: the lock manager context
>  * @targetURI: destination of the migration
>  *
>  * Notify the supevisor that the process is about to be migrated
>  * to another host at @targetURI
>  */
> int virLockManagerPrepareMigrate(virLockManagerPtr manager,
>                                  const char *targetURI,
>                                  unsigned int flags);
> /**
>  * virLockManagerCompleteMigrateIn:
>  * @manager: the lock manager context
>  * @sourceURI: the origin of the migration

   * @flags: combination of virLockManagerCompleteMigrateFlags

>  *
>  * Notify the supervisor that the process has completed
>  * migration. If the migration is aborted, then the @flags
>  * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL
>  */
> int virLockManagerCompleteMigrateIn(virLockManagerPtr manager,
>                                     const char *sourceURI,
>                                     unsigned int flags);

  We should specify that this is to be called only on the migration
target node, right?
  What is the error handling, what would failure indicate there ?

> /**
>  * virLockManagerCompleteMigrateOut:
>  * @manager: the lock manager context
>  * @targetURI: the destination of the migration

   * @flags: combination of virLockManagerCompleteMigrateFlags

>  *
>  * Notify the supervisor that the process has completed
>  * migration. If the migration is aborted, then the @flags
>  * should be VIR_LOCK_MANAGER_MIGRATE_CANCEL
>  */
> int virLockManagerCompleteMigrateOut(virLockManagerPtr manager,
>                                      const char *targetURI,
>                                      unsigned int flags);
> 

  We should specify that this is to be called only on the migration
source node, right?
  What is the error handling, what would failure indicate there ?

> 
> /**
>  * virLockManagerGetChild:
>  * @manager: the lock manager context

   * @flags: unused yet use 0

>  *
>  * Obtain the PID of the managed process
>  */
> int virLockManagerGetChild(virLockManagerPtr manager,
>                            pid_t *pid,
>                            unsigned int flags);
> 
> 
> /**
>  * virLockManager:

  Shutdown

>  * @manager: the lock manager context

   * @flags: unused yet use 0

>  *
>  * Inform the supervisor that the supervised process has
>  * been, or can be stopped.
>  */
> int virLockManagerShutdown(virLockManagerPtr manager,
>                            unsigned int flags);

  Except errors in parameter do we envision any failure ?

It seems to me this could be called multiple time in a row without
prejudice, if that's the case then we should document it

> /**
>  * virLockManager:

 Free

>  * @manager: the lock manager context
>  *
>  * Release resources. If the supervised process is still
>  * running, it will be killed with prejudice
>  */
> void virLockManagerFree(virLockManagerPtr manager);

 same as for virLockManagerShutdown what would be the failure mode.
But the most important point I guess is does the lock manager provide a
way to tell it's done with a guest. We could have scenarios where
the guest disapears but the lock manager is hung for example trying to
contact a disapeared network device. The domain being down doesn't mean
the lock manager is.

  As said I think the API in general is a nice abstraction for what we
need but the devil really is in the detail of the operation model, and
that must include the error cases. There many ways something may go
wrong but we should have relaible expectation about how this is handled
and reported across multiple implementations,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]