[libvirt] virDomainMemoryPeek & maximum remote message buffer size

Wed Jul 9 21:31:39 UTC 2008

On Wed, Jul 09, 2008 at 08:26:47PM +0100, Richard W.M. Jones wrote:
> The kernel images that I want to snoop in virt-mem are around 16 MB in
> size.  In the qemu / KVM case, these images have to travel over the
> remote connection.  Because of limits on the maximum message size,
> they have to travel currently in 64 KB chunks, and it turns out that
> this is slow.  Apparently the dominating factors are how long it takes
> to issue the 'memsave' command in the qemu monitor (there is some big
> constant overhead), and extra network round trips.
> 
> The current remote message size is intentionally limited to 256KB
> (fully serialized, including all XDR headers and overhead), so the
> most we could practically send in a single message at the moment is
> 128KB if we stick to powers of two, or ~255KB if we don't.
> 
> The reason we limit it is to avoid denial of service attacks, where a
> rogue client or server sends excessively large messages and causes the
> peer to allocate lots of memory [eg. if we didn't have any limit, then
> you could send a message which was several GB in size and cause
> problems at the other end, because the message is slurped in before it
> is fully parsed].
> 
> There is a second problem with reading the kernel in small chunks,
> namely that this allows the virtual machine to make a lot of progress,
> so we don't get anything near an 'instantaneous' snapshot (getting the
> kernel in a single chunk doesn't necessarily guarantee this either,
> but it's better).
> 
> As an experiment, I tried increasing the maximum message to 32 MB, so
> that I could send the whole kernel in one go.
>
> Unfortunately just increasing the limit doesn't work for two reasons,
> one prosaic and one very weird:
> 
> (1) The current code likes to keep message buffers on the stack, and
> because Linux limits the stack to something artificially small, this
> fails.  Increasing the stack ulimit is a short-term fix for this,
> while testing.  In the long term we could rewrite any code which does
> this to use heap buffers instead.

Yeah we should fix this. I've had a patch for refactoring the main
dispatch method pending for quite a while which dramatically reduces
stack usage

> (2) There is some really odd problem with our use of recv(2) which
> causes messages > 64 KB to fail.  I have no idea what is really
> happening, but the sequence of events seems to be this:
> 
>   server                             client
> 
>   write(sock,buf,len) = len-k
>                                      recv(sock,buf,len) = len-k
> 
>   write(sock,buf+len-k,len-k) = k
> 
>                                      recv(sock,buf,k) = 0 [NOT k]

Bizarre. The docs quite clearly say

   These calls return the number of bytes received, or -1 if an error occurred.
   The return value will be 0 when the peer has performed an orderly shutdown.

So its clearly thinking there's a shutdown here.

Were you doing this over the UNIX socket, or the TCP ?  If the latter
then might want to turn off all authentication and use the TCP socket
to ensure none of the encryption routines are in use.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|