[libvirt] [PATCH 1/6] Add new API virDomainStreamDisk[Info] to header and drivers
Adam Litke
agl at us.ibm.com
Mon May 9 17:51:16 UTC 2011
On 05/09/2011 11:09 AM, Daniel P. Berrange wrote:
> On Mon, May 02, 2011 at 04:29:49PM -0500, Adam Litke wrote:
>> After several long distractions, I am back to working on disk streaming.
>> Before I hit the list with a new series of patches, I was hoping to
>> reach some middle ground on the proposed streaming API.
>>
>> On Fri, 2011-04-08 at 14:31 +0100, Daniel P. Berrange wrote:
>>> On Thu, Apr 07, 2011 at 04:31:59PM -0500, Adam Litke wrote:
>>>> /*
>>>> + * Disk Streaming
>>>> + */
>>>> +typedef enum {
>>>> + VIR_STREAM_DISK_ONE = 1, /* Stream a single disk unit */
>>>> + VIR_STREAM_DISK_START = 2, /* Stream the entire disk */
>>>> + VIR_STREAM_DISK_STOP = 4, /* Stop streaming a disk */
>>>> +} virDomainStreamDiskFlags;
>>>
>>> Using flags to combine two separate tasks into one single API
>>> is rather unpleasant. As raised in the previous patch, the API
>>> should also be taking a offset+length in bytes, then there is
>>> no need for a special case transfer of an individual sector.
>>
>> This is a fair point. I will work with Stefan to support an
>> offset/length qemu API. Since there doesn't seem to be a great way to
>> query device sizes, I think we do need a convenient way to request that
>> the entire disk be streamed. We could either do this with a flag or by
>> overriding (offset==0&& length==0) to mean stream the entire device.
>> Do you have a preference?
>
> Since length==0 is otherwise meaningless, it is fine to use that
> to indicate "until end of disk". This is consistent with
> what we do for virStorageVolUpload/Download which allow length=0
> to indicate "until end of disk".
>
>>>> +#define VIR_STREAM_PATH_BUFLEN 1024
>>>> +#define VIR_STREAM_DISK_MAX_STREAMS 10
>>>> +
>>>> +typedef struct _virStreamDiskState virStreamDiskState;
>>>> +struct _virStreamDiskState {
>>>> + char path[VIR_STREAM_PATH_BUFLEN];
>>>> + /*
>>>> + * The unit of measure for size and offset is unspecified. These fields
>>>> + * are meant to indicate the progress of a continuous streaming operation.
>>>> + */
>>>> + unsigned long long offset; /* Current offset of active streaming */
>>>> + unsigned long long size; /* Disk size */
>>>> +};
>>>> +typedef virStreamDiskState *virStreamDiskStatePtr;
>>>> +
>>>> +unsigned long long virDomainStreamDisk(virDomainPtr dom,
>>>> + const char *path,
>>>> + unsigned long long offset,
>>>> + unsigned int flags);
>>>> +
>>>> +int virDomainStreamDiskInfo(virDomainPtr dom,
>>>> + virStreamDiskStatePtr states,
>>>> + unsigned int nr_states,
>>>> + unsigned int flags);
>>>
>>> I would have liked it if we could use the existing JobInfo APIs for
>>> getting progress information, but if we need to allow concurrent
>>> usage for multiple disks per-VM, we can't. I think we should still
>>> use a similar style of API though.
>>
>> The goal is to eventually support concurrent streams. Therefore, we
>> will probably need to roll our own Status and Abort APIs (see my
>> proposed API below).
>>
>>> There also doesn't appear to be a precise way to determine if the
>>> copying of an entire disk failed part way through, and if so, how
>>> much was actually copied.
>>>
>>> Taking all the previous points together, I think the API needs to
>>> look like this:
>>>
>>> typedef enum {
>>> /* If set, virDomainBlockAllocate() will return immediately
>>> * allowing polling for operation completion& status
>>> */
>>> VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>>> } virDomainBlockAllocateFlags;
>>>
>>> /*
>>> * @path: fully qualified filename of the virtual disk
>>
>> I probably misnamed it, but path is actually the device alias, not a
>> path to an image file.
>
> Hmm, I wonder whether that is a good choice or not. The other
> APIs all use the disk path. Perhaps we could use that as default
> and have a flag to indicate whether the path should be treated as
> a device alias instead. Thus getting both options.
Does libvirt have an option to lookup a device alias by disk path? If
so, then I am happy to use the file path and convert it to the form that
qemu expects.
>>
>>> * @offset: logical position in bytes, within the virtual disk
>>> * @length: amount of data to copy, in bytes starting from @offset
>>> * @flags: One of virDomainBlockAllocateFlags, or zero
>>> *
>>> * Ensure the virtual disk @path is fully allocated
>>> * between @offset and @offset+ at length. If a backing
>>> * store is present, data will be filled from the
>>> * backing store, otherwise data will be fileld with
>>> * zeros
>>> *
>>> * If @flags contains VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK,
>>> * this API will return immediately after initiating the
>>> * copy, otherwise it will block until copying is complete
>>> *
>>> */
>>> int virDomainBlockAllocate(virDomainPtr dom,
>>> const char *path,
>>> unsigned long long offset,
>>> unsigned long long length,
>>> unsigned int flags);
>>>
>>>
>>> /*
>>> * @path: fully qualified filename of the virtual disk
>>> * @info: allocated struct to return progress info
>>> *
>>> * Query the progress of a disk allocation job. This
>>> * API must be used when virDomainBlockAllocate() was
>>> * invoked with the VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK
>>> * flag set.
>>> *
>>> * The @info.type field will indicate whether the job
>>> * was completed successfully, or failed part way
>>> * through.
>>> *
>>> * The @info data progress fields will contain current
>>> * progress information.
>>> *
>>> * The hypervisor driver may optionally chose to also
>>> * fillin a time estimate for completion.
>>> */
>>> int virDomainBlockGetJobInfo(virDomainPtr dom,
>>> const char *path,
>>> virDomainJobInfoPtr info);
>>>
>>> /*
>>> * @path: fully qualified filename of the virtual disk
>>> *
>>> * Request that a disk allocation job be aborted at
>>> * the soonest opportunity. This API can only be used
>>> * when virDomainBlockAllocate() was invoked with the
>>> * VIR_DOMAIN_DISK_ALLOCATE_NONBLOCK flag set.
>>> */
>>> int virDomainBlockAbortJob(virDomainPtr dom,
>>> const char *path);
>>>
>>>
>>> typedef struct _virDomainBlockRegion virDomainBlockRegion;
>>> typedef virDomainBlockRegion *virDomainBlockRegionPtr;
>>> struct _virDomainBlockRegion {
>>> /* The logical offset within the file of the allocated region */
>>> unsigned long long offset;
>>> /* The length of the allocated region */
>>> unsigned long long length;
>>> };
>>>
>>> /*
>>> * @path: fully qualified filename of the virtual disk
>>> * @nregions: filled in the number of @region structs
>>> * @regions: filled with a list of allocated regions
>>> *
>>> * Query the extents of allocated regions within the
>>> * virtual disk file. The offsets in the list of regions
>>> * are not guarenteed to be sorted in any explicit order.
>>> */
>>> int virDomainBlockGetAllocationMap(virDomainPtr dom,
>>> const char *path,
>>> unsigned int *nregions,
>>> virDomainBlockRegionPtr *regions);
>>
>> I am not convinced that we really need the block allocation map stuff.
>> It's a fun toy, but the layout of data within a device is far too
>> low-level of a concept to be exposing to users. In my opinion,
>> streaming should really be done sequentially from the start of the
>> device to the end (with arbitrary chunk sizes allowed for throttling
>> purposes). If an application wants to stream according to some special
>> pattern, let them maintain a data structure outside of libvirt to manage
>> that extra complexity. It's not an error to stream a chunk that is
>> already present. The populated areas will just complete immediately.
>> Therefore, there isn't a need to maintain a strict record of outstanding
>> regions.
>
> Well we can always add something like this in at a later
> date, so it is fine to drop it.
>
>>> This takes care of things for running guests. It would be
>>> desirable to have the same functionality available when a
>>> guest is not running, via the virStorageVol APIs. Indeed,
>>> this would allow access to the allocation functionality
>>> for disks not explicitly associated with any VM yet.
>>
>> In light of previous discussion about the complexities around storage
>> formats and filesystems, I am not sure how useful such an offline API is
>> going to be in practice. At any rate, I would prefer to consider that
>> issue separately.
>>
>> So, with my points taken into account I would like to counter with the
>> following API proposal:
>>
>> ==== snip ====
>>
>> typedef enum {
>> /* If set, virDomainBlockAllocate() will return immediately
>> * allowing polling for operation completion& status
>> */
>> VIR_DOMAIN_DISK_STREAM_NONBLOCK,
>> } virDomainBlockStreamFlags;
>>
>> /*
>> * @device: alias of the target block device
>> * @offset: logical position in bytes, within the virtual disk
>> * @length: amount of data to copy, in bytes starting from @offset
>> * @flags: One of virDomainBlockAllocateFlags, or zero
>> *
>> * Ensure the virtual disk @device is fully allocated
>> * between @offset and @offset+ at length. If a backing
>> * store is present, data will be filled from the
>> * backing store, otherwise data will be fileld with
>> * zeros.
>> *
>> * If @flags contains VIR_DOMAIN_DISK_STREAM_NONBLOCK,
>> * this API will return immediately after initiating the
>> * copy, otherwise it will block until copying is complete
>> *
>> */
>> int virDomainBlockStream(virDomainPtr dom,
>> const char *device,
>> unsigned long long offset,
>> unsigned long long length,
>> unsigned int flags);
>>
>> /*
>> * @device: alias of the target block device
>> *
>> * Request that a disk streaming job be aborted at
>> * the soonest opportunity. This API can only be used
>> * when virDomainBlockStream() was invoked with the
>> * VIR_DOMAIN_DISK_STREAM_NONBLOCK flag set.
>> */
>> int virDomainBlockStreamAbort(virDomainPtr dom,
>> const char *device);
>>
>> typedef enum {
>> VIR_BLOCK_STREAM_STATE_COMPLETED = 0,
>> VIR_BLOCK_STREAM_STATE_ACTIVE = 1,
>> VIR_BLOCK_STREAM_STATE_FAILED = 2,
>> } virBlockStreamStatus;
>>
>> typedef struct _virBlockStreamInfo virBlockStreamInfo;
>> struct _virBlockStreamInfo {
>> virBlockStreamStatus status;
>
> Coding guidelines don't allow use of enum types in structs
> or as API parameters, instead requiring 'int'.
OK.
>> unsigned long long remaining; /* number of bytes remaining */
>> };
>> typedef virBlockStreamInfo *virBlockStreamInfoPtr;
>
> s/virBlock/virDomainBlock/ in several places here.
>
>>
>> /*
>> * @device: alias of the target block device
>> * @info: allocated struct to return progress info
>> *
>> * Query the progress of a disk streaming job. This
>> * API must be used when virDomainBlockStream() was
>> * invoked with the VIR_DOMAIN_DISK_STREAM_NONBLOCK
>> * flag set.
>> *
>> */
>> int virDomainBlockStreamInfo(virDomainPtr dom,
>> const char *device,
>> virBlockStreamInfoPtr info);
>
> Any reason you didn't just use the existing 'virDomainJobInfoPtr'
> struct, with this new API ? In particular I think we want many
> of the other fields from that struct beyond just 'remaining'.
It looks like I can use this structure without problems but given the
desire to eventually support simultaneous streams, we'll still need to
write our own start/stop/status routines.
More information about the libvir-list
mailing list