[libvirt] [RFC PATCH 2/2] lib: Add note that bulk stats API queries may overrun RPC buffers

Tue May 23 08:26:59 UTC 2017

On Tue, May 23, 2017 at 10:12:06AM +0200, Martin Kletzander wrote:
> On Mon, May 22, 2017 at 06:00:13PM +0200, Peter Krempa wrote:
> > Hint that the users should limit the number of VMs queried in the bulk
> > stats API.
> > ---
> > src/libvirt-domain.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> > 
> > diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
> > index 310b91b37..b01f2705c 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -11315,6 +11315,10 @@ virConnectGetDomainCapabilities(virConnectPtr conn,
> >  * VIR_CONNECT_GET_ALL_DOMAINS_STATS_SHUTOFF and/or
> >  * VIR_CONNECT_GET_ALL_DOMAINS_STATS_OTHER for all other states.
> >  *
> > + * Note that this API is prone to exceeding maximum RPC message size on hosts
> > + * with lots of VMs so it's suggested to use virDomainListGetStats with a
> > + * reasonable list of VMs as the argument.
> > + *
> >  * Returns the count of returned statistics structures on success, -1 on error.
> >  * The requested data are returned in the @retStats parameter. The returned
> >  * array should be freed by the caller. See virDomainStatsRecordListFree.
> > @@ -11386,6 +11390,9 @@ virConnectGetAllDomainStats(virConnectPtr conn,
> >  * Note that any of the domain list filtering flags in @flags may be rejected
> >  * by this function.
> >  *
> > + * Note that this API is prone to exceeding maximum RPC message size on hosts
> > + * with lots of VMs so it's suggested to limit the number of VMs queried.
> > + *
> 
> Makes sense to me, now we can at least point people somewhere when this
> happens again.  ACK from me, but feel free to wait if you want to really
> make this an RFC and you want more opinions.
> 
> Is there a discussion about how to continue for the future?  I remember
> Michal starting some discussion, but I'm not sure whether that was about
> the same thing.  Anyway, that's just a rhetorical question so that I can
> say the two ideas I have:
> 
> 1) Just increase the limit over time.  Computers and networks are
>    getting faster, there's more storage space and memory, and so on.
>    It only makes sense to do scale other things respectively.
> 
> 2) Have a new API that streams the data back over virStream.  We can
>    then do it continuously (like every X seconds), that might help
>    management apps as well because I suspect that's how they use this
>    API anyway.
> 
> Thanks for listening ;)

Before we do either of those we should consider just changing the RPC
message format for the APIs in question.

Essentially have, say, 10 statistics we are grabbing for every VM.
On the wire we are encoding

  <name 1>: <value for stat 1, VM 1>
  <name 2>: <value for stat 2, VM 1>
  ...
  <name N>: <value for stat N, VM 1>

  <name 1>: <value for stat 1, VM 2>
  <name 2>: <value for stat 2, VM 2>
  ...
  <name N>: <value for stat N, VM 2>

  <name 1>: <value for stat 1, VM 3>
  <name 2>: <value for stat 2, VM 3>
  ...
  <name N>: <value for stat N, VM 3>

where <name XX> is a long string that is basically the same for
every VM, while <value> is just an int64.

We could change this so that we have a table of names at the
start

  <name 1>: <index 1>
  <name 2>: <index 2>
  ...
  <name N>: <index N>

and then for each VM we have

  <index 1>: <value for stat 1, VM 1>
  <index 2>: <value for stat 2, VM 1>
  ...
  <index N>: <value for stat N, VM 1>

Soo the name string is no longer repeated, and in its place is an
integer index. If every name is say 12 characters long, and the
index is a 4 byte int, we've reduced a 20 byte packet per stat
to 12 bytes.

That's just one idea, possibly/probably not even the best - there's
various other ways we could encode the data to make it more efficient.

We could perhaps even consider changing the public API too, because
the typed parameters are a really inefficient way to exporting the
data in the API too

If you want to get more radical with a push based solution, then I
would suggest we consider setting up shared memory segment between
libvirt & the client app where we just continuously update the data,
avoiding RPC entirely.

This would only work for apps running locally of course, but at a
large scale, it seems the major apps using libvirt have all taken
the approach of having a local libvirtd connection, rather than
trying to use our remote RPC at data center scale levels from
a central host.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|