[libvirt] [PATCH 00/27] Introduce sparse streams
John Ferlan
jferlan at redhat.com
Thu May 5 15:17:33 UTC 2016
On 04/28/2016 06:04 AM, Michal Privoznik wrote:
> So, after couple of sleepless nights and headaches I'm proud to
> announce that finally got this working.
>
>
> What?
> Our regular streams that are can be used to transfer disk images
> for domains are unaware of any sparseness. Therefore they have
> two limitations:
>
> a) transferring big but sparse image can take ages as all the
> holes (interpreted by kernel as '\0') have to go through our
> event loop.
> b) resulting volume is not sparse even if the source was.
>
>
> How?
> I went by verified approach that linux kernel has. One way to
> look at our streams is just like read() and write() with a
> different names: virStreamRecv() and virStreamSend(). They even
> have the same prototype (if 'int fd' is substituted with
> 'virStreamPtr'). Now, holes in files are created and detected via
> third API: lseek(). Therefore I'm introducing new virStreamSkip()
> API that mimics the missing primitive. Now, because our streams
> do not necessarily have to work over files (they are for generic
> data transfer), I had to let users register a callback that is
> called whenever the other side calls virStreamSkip().
>
> So now that we have all three primitives, we can focus on making
> life easier for our users. Nobody is actually using bare
> virStreamSend() and virStreamRecv() rather than our wrappers:
> virStreamSendAll() and virStreamRecvAll(). With my approach
> described above just virStreamSendAll() needs to be adjusted so
> that it's 'sparse file' aware. The virStreamRecvAll() will only
> get the data to write (just like it is now) with skip callback
> called automatically whenever needed. In order for
> virStreamSendAll() to skip holes I'm introducing yet another
> callback: virStreamInDataFunc(). This callback will help us to
> create a map of a file: before each virStreamSend() it checks
> whether we are in a data section or a hole and calls
> virStreamSend() or virStreamSkip() respectively.
>
> Do not worry - it will all become clear once you see the code.
>
> Now question is - how will users enable this feature? I mean, we
> have take into account that we might be talking to an older
> daemon that does not know how to skip a hole. Or vice versa -
> older client.
> The solution I came up with is to introduce flags to APIs where
> sparse streams make sense. I guess it makes sense for volume
> upload and download, but almost certainly makes no sense for
> virDomainOpenConsole().
>
>
> Code?
>>From users POV they just need to pass correct argument to
> 'vol-upload' or 'vol-download' virsh commands. One layer down, on
> programming level they need to merely:
>
> st = virStreamNew(conn, 0);
> virStreamRegisterSkip(st, skipFunc, &fd);
> virStorageVolDownload(st, ...);
> virStreamRecvAll(st, sinkFunc, &fd);
>
> where:
>
> int skipFunc(virStreamPtr st,
> unsigned long long offset, void *opaque)
> {
> int *fd = opaque;
> return lseek(*fd, offset, SEEK_CUR) == (off_t) -1 ? -1 : 0;
> }
>
> And for uploading it's slightly more verbose - see patch 24.
>
>
> Limitations?
> While testing this on my machine with XFS, I've noticed that the
> resulting map of a transferred file is not exactly the same as
> the source's. Checksums are the same though. After digging deeper
> I found this commit in the kernel:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=055388a3188f56676c21e92962fc366ac8b5cb72
>
> Thing is, as we transfer the file, we are practically just
> seeking at EOF and thus creating holes. But if the hole size is
> small enough, XFS will use some speculative file allocation
> algorithm and eventually fully allocate the blocks even if we
> intended to create a hole. This does not occur when punching a
> hole into a file though. Well, I guess XFS devels have some
> reasons to do that.
>
> This behaviour has not been observed on EXT4.
>
>
> Notes?
> Oh, patches 01-03 have been ACKed already, but are not pushed yet
> because of the freeze. But since this feature build on the top of
> them, I'm sending them too.
>
> Also the whole patch set is accessible at my github:
>
> https://github.com/zippy2/libvirt/tree/sparse_streams4
>
>
> Michal Privoznik (27):
> Revert "rpc: Fix slow volume download (virsh vol-download)"
> virnetclientstream: Process stream messages later
> virStream{Recv,Send}All: Increase client buffer
> Introduce virStreamSkip
> Introduce virStreamRegisterSkip and virStreamSkipCallback
> Introduce virStreamInData and virStreamRegisterInData
> virNetClientStreamNew: Track origin stream
> Track if stream is seekable
> RPC: Introduce virNetStreamSkip
> Introduce VIR_NET_STREAM_SKIP message type
> Teach wireshark plugin about VIR_NET_STREAM_SKIP
> daemon: Implement VIR_NET_STREAM_SKIP handling
> daemon: Introduce virNetServerProgramSendStreamSkip
> virnetclientstream: Introduce virNetClientStreamSendSkip
> virnetclientstream: Introduce virNetClientStreamHandleSkip
> remote_driver: Implement virStreamSkip
> daemonStreamHandleRead: Wire up seekable stream
> virNetClientStream: Wire up VIR_NET_STREAM_SKIP
> virStreamSendAll: Wire up sparse streams
> fdstream: Implement seek
> gendispatch: Introduce @sparseflag for our calls
> Introduce virStorageVol{Download,Upload}Flags
> virsh: Implement sparse stream to vol-download
> virsh: Implement sparse stream to vol-upload
> fdstream: Suppress use of IO helper for sparse streams
> daemon: Don't call virStreamInData so often
> storage: Enable sparse streams for virStorageVol{Download,Upload}
>
> daemon/remote.c | 2 +-
> daemon/stream.c | 134 +++++++++++++--
> daemon/stream.h | 3 +-
> include/libvirt/libvirt-storage.h | 9 +
> include/libvirt/libvirt-stream.h | 47 ++++++
> src/datatypes.h | 8 +
> src/driver-stream.h | 5 +
> src/fdstream.c | 156 +++++++++++++++---
> src/fdstream.h | 3 +-
> src/libvirt-storage.c | 4 +-
> src/libvirt-stream.c | 238 ++++++++++++++++++++++++++-
> src/libvirt_internal.h | 7 +
> src/libvirt_private.syms | 2 +
> src/libvirt_public.syms | 7 +
> src/libvirt_remote.syms | 2 +
> src/remote/remote_driver.c | 41 ++++-
> src/remote/remote_protocol.x | 2 +
> src/rpc/gendispatch.pl | 21 ++-
> src/rpc/virnetclient.c | 1 +
> src/rpc/virnetclientstream.c | 308 ++++++++++++++++++++++++-----------
> src/rpc/virnetclientstream.h | 10 +-
> src/rpc/virnetprotocol.x | 16 +-
> src/rpc/virnetserverprogram.c | 33 ++++
> src/rpc/virnetserverprogram.h | 7 +
> src/storage/storage_backend.c | 12 +-
> src/storage/storage_driver.c | 4 +-
> src/virnetprotocol-structs | 4 +
> tools/virsh-volume.c | 40 ++++-
> tools/virsh.c | 79 +++++++++
> tools/virsh.h | 12 ++
> tools/virsh.pod | 6 +-
> tools/wireshark/src/packet-libvirt.c | 40 +++++
> tools/wireshark/src/packet-libvirt.h | 2 +
> 33 files changed, 1104 insertions(+), 161 deletions(-)
>
Read the code mostly for educational purposes, but also ran through
Coverity...
First the coverity notes:
Patch 11: dissect_xdr_stream_skip() - "start" is not initialized
Patch 15: virNetClientStreamHandleSkip() - if (!msg || ...) check then
dereferences msg in the virReportError. So it seems !msg needs its own
check and message.
Couple of questions/thoughts about skipping in general.
Is it possible to skip past some sort of EOF? IOW: What if skip
'offset' is larger than perceived maximum size of the file? I didn't dig
line to line, but just a general thought along the lines of the either
not so bright or even worse malicious code that now has a means to fill
up a disk with a mostly empty file.
In Patch 23, your commit message indicates enabling skips for specific
backends - perhaps batter to note "local" vs "remote" type operations?
For iSCSI, it's possible to configure a disk/lun such that the target
file is found in the /dev/disk/by-path (mode="host").
John
More information about the libvir-list
mailing list