[libvirt] [PATCH 00/27] Introduce sparse streams

John Ferlan jferlan at redhat.com
Thu May 5 15:17:33 UTC 2016



On 04/28/2016 06:04 AM, Michal Privoznik wrote:
> So, after couple of sleepless nights and headaches I'm proud to
> announce that finally got this working.
> 
> 
> What?
> Our regular streams that are can be used to transfer disk images
> for domains are unaware of any sparseness. Therefore they have
> two limitations:
> 
> a) transferring big but sparse image can take ages as all the
> holes (interpreted by kernel as '\0') have to go through our
> event loop.
> b) resulting volume is not sparse even if the source was.
> 
> 
> How?
> I went by verified approach that linux kernel has. One way to
> look at our streams is just like read() and write() with a
> different names: virStreamRecv() and virStreamSend(). They even
> have the same prototype (if 'int fd' is substituted with
> 'virStreamPtr'). Now, holes in files are created and detected via
> third API: lseek(). Therefore I'm introducing new virStreamSkip()
> API that mimics the missing primitive. Now, because our streams
> do not necessarily have to work over files (they are for generic
> data transfer), I had to let users register a callback that is
> called whenever the other side calls virStreamSkip().
> 
> So now that we have all three primitives, we can focus on making
> life easier for our users. Nobody is actually using bare
> virStreamSend() and virStreamRecv() rather than our wrappers:
> virStreamSendAll() and virStreamRecvAll(). With my approach
> described above just virStreamSendAll() needs to be adjusted so
> that it's 'sparse file' aware. The virStreamRecvAll() will only
> get the data to write (just like it is now) with skip callback
> called automatically whenever needed. In order for
> virStreamSendAll() to skip holes I'm introducing yet another
> callback: virStreamInDataFunc(). This callback will help us to
> create a map of a file: before each virStreamSend() it checks
> whether we are in a data section or a hole and calls
> virStreamSend() or virStreamSkip() respectively.
> 
> Do not worry - it will all become clear once you see the code.
> 
> Now question is - how will users enable this feature? I mean, we
> have take into account that we might be talking to an older
> daemon that does not know how to skip a hole. Or vice versa -
> older client.
> The solution I came up with is to introduce flags to APIs where
> sparse streams make sense. I guess it makes sense for volume
> upload and download, but almost certainly makes no sense for
> virDomainOpenConsole().
> 
> 
> Code?
>>From users POV they just need to pass correct argument to
> 'vol-upload' or 'vol-download' virsh commands. One layer down, on
> programming level they need to merely:
> 
> st = virStreamNew(conn, 0);
> virStreamRegisterSkip(st, skipFunc, &fd);
> virStorageVolDownload(st, ...);
> virStreamRecvAll(st, sinkFunc, &fd);
> 
> where:
> 
> int skipFunc(virStreamPtr st,
>              unsigned long long offset, void *opaque)
> {
>     int *fd = opaque;
>     return lseek(*fd, offset, SEEK_CUR) == (off_t) -1 ? -1 : 0;
> }
> 
> And for uploading it's slightly more verbose - see patch 24.
> 
> 
> Limitations?
> While testing this on my machine with XFS, I've noticed that the
> resulting map of a transferred file is not exactly the same as
> the source's. Checksums are the same though. After digging deeper
> I found this commit in the kernel:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=055388a3188f56676c21e92962fc366ac8b5cb72
> 
> Thing is, as we transfer the file, we are practically just
> seeking at EOF and thus creating holes. But if the hole size is
> small enough, XFS will use some speculative file allocation
> algorithm and eventually fully allocate the blocks even if we
> intended to create a hole. This does not occur when punching a
> hole into a file though. Well, I guess XFS devels have some
> reasons to do that.
> 
> This behaviour has not been observed on EXT4.
> 
> 
> Notes?
> Oh, patches 01-03 have been ACKed already, but are not pushed yet
> because of the freeze. But since this feature build on the top of
> them, I'm sending them too.
> 
> Also the whole patch set is accessible at my github:
> 
> https://github.com/zippy2/libvirt/tree/sparse_streams4
> 
> 
> Michal Privoznik (27):
>   Revert "rpc: Fix slow volume download (virsh vol-download)"
>   virnetclientstream: Process stream messages later
>   virStream{Recv,Send}All: Increase client buffer
>   Introduce virStreamSkip
>   Introduce virStreamRegisterSkip and virStreamSkipCallback
>   Introduce virStreamInData and virStreamRegisterInData
>   virNetClientStreamNew: Track origin stream
>   Track if stream is seekable
>   RPC: Introduce virNetStreamSkip
>   Introduce VIR_NET_STREAM_SKIP message type
>   Teach wireshark plugin about VIR_NET_STREAM_SKIP
>   daemon: Implement VIR_NET_STREAM_SKIP handling
>   daemon: Introduce virNetServerProgramSendStreamSkip
>   virnetclientstream: Introduce virNetClientStreamSendSkip
>   virnetclientstream: Introduce virNetClientStreamHandleSkip
>   remote_driver: Implement virStreamSkip
>   daemonStreamHandleRead: Wire up seekable stream
>   virNetClientStream: Wire up VIR_NET_STREAM_SKIP
>   virStreamSendAll: Wire up sparse streams
>   fdstream: Implement seek
>   gendispatch: Introduce @sparseflag for our calls
>   Introduce virStorageVol{Download,Upload}Flags
>   virsh: Implement sparse stream to vol-download
>   virsh: Implement sparse stream to vol-upload
>   fdstream: Suppress use of IO helper for sparse streams
>   daemon: Don't call virStreamInData so often
>   storage: Enable sparse streams for virStorageVol{Download,Upload}
> 
>  daemon/remote.c                      |   2 +-
>  daemon/stream.c                      | 134 +++++++++++++--
>  daemon/stream.h                      |   3 +-
>  include/libvirt/libvirt-storage.h    |   9 +
>  include/libvirt/libvirt-stream.h     |  47 ++++++
>  src/datatypes.h                      |   8 +
>  src/driver-stream.h                  |   5 +
>  src/fdstream.c                       | 156 +++++++++++++++---
>  src/fdstream.h                       |   3 +-
>  src/libvirt-storage.c                |   4 +-
>  src/libvirt-stream.c                 | 238 ++++++++++++++++++++++++++-
>  src/libvirt_internal.h               |   7 +
>  src/libvirt_private.syms             |   2 +
>  src/libvirt_public.syms              |   7 +
>  src/libvirt_remote.syms              |   2 +
>  src/remote/remote_driver.c           |  41 ++++-
>  src/remote/remote_protocol.x         |   2 +
>  src/rpc/gendispatch.pl               |  21 ++-
>  src/rpc/virnetclient.c               |   1 +
>  src/rpc/virnetclientstream.c         | 308 ++++++++++++++++++++++++-----------
>  src/rpc/virnetclientstream.h         |  10 +-
>  src/rpc/virnetprotocol.x             |  16 +-
>  src/rpc/virnetserverprogram.c        |  33 ++++
>  src/rpc/virnetserverprogram.h        |   7 +
>  src/storage/storage_backend.c        |  12 +-
>  src/storage/storage_driver.c         |   4 +-
>  src/virnetprotocol-structs           |   4 +
>  tools/virsh-volume.c                 |  40 ++++-
>  tools/virsh.c                        |  79 +++++++++
>  tools/virsh.h                        |  12 ++
>  tools/virsh.pod                      |   6 +-
>  tools/wireshark/src/packet-libvirt.c |  40 +++++
>  tools/wireshark/src/packet-libvirt.h |   2 +
>  33 files changed, 1104 insertions(+), 161 deletions(-)
> 

Read the code mostly for educational purposes, but also ran through
Coverity...

First the coverity notes:

Patch 11: dissect_xdr_stream_skip() - "start" is not initialized
Patch 15: virNetClientStreamHandleSkip() - if (!msg || ...) check then
dereferences msg in the virReportError.  So it seems !msg needs its own
check and message.

Couple of questions/thoughts about skipping in general.

Is it possible to skip past some sort of EOF?  IOW: What if skip
'offset' is larger than perceived maximum size of the file? I didn't dig
line to line, but just a general thought along the lines of the either
not so bright or even worse malicious code that now has a means to fill
up a disk with a mostly empty file.

In Patch 23, your commit message indicates enabling skips for specific
backends - perhaps batter to note "local" vs "remote" type operations?
For iSCSI, it's possible to configure a disk/lun such that the target
file is found in the /dev/disk/by-path (mode="host").

John




More information about the libvir-list mailing list