[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [PATCHv8 16/17] qemu: Report cache occupancy (CMT) with domstats




> -----Original Message-----
> From: John Ferlan [mailto:jferlan redhat com]
> Sent: Thursday, November 15, 2018 12:18 AM
> To: Wang, Huaqiang <huaqiang wang intel com>; libvir-list redhat com
> Cc: Feng, Shaohe <shaohe feng intel com>; Ding, Jian-feng <jian-
> feng ding intel com>; Zang, Rui <rui zang intel com>
> Subject: Re: [PATCHv8 16/17] qemu: Report cache occupancy (CMT) with
> domstats
> 
> 
> 
> On 11/12/18 8:31 AM, Wang Huaqiang wrote:
> > Adding the interface in qemu to report CMT statistic information
> > through command 'virsh domstats --cpu-total'.
> >
> > Below is a typical output:
> >
> >          # virsh domstats 1 --cpu-total
> >          Domain: 'ubuntu16.04-base'
> >            ...
> >            cpu.cache.monitor.count=2
> >            cpu.cache.monitor.0.name=vcpus_1
> >            cpu.cache.monitor.0.vcpus=1
> >            cpu.cache.monitor.0.bank.count=2
> >            cpu.cache.monitor.0.bank.0.id=0
> >            cpu.cache.monitor.0.bank.0.bytes=4505600
> >            cpu.cache.monitor.0.bank.1.id=1
> >            cpu.cache.monitor.0.bank.1.bytes=5586944
> >            cpu.cache.monitor.1.name=vcpus_4-6
> >            cpu.cache.monitor.1.vcpus=4,5,6
> >            cpu.cache.monitor.1.bank.count=2
> >            cpu.cache.monitor.1.bank.0.id=0
> >            cpu.cache.monitor.1.bank.0.bytes=17571840
> >            cpu.cache.monitor.1.bank.1.id=1
> >            cpu.cache.monitor.1.bank.1.bytes=29106176
> >
> > Signed-off-by: Wang Huaqiang <huaqiang wang intel com>
> > ---
> >  src/libvirt-domain.c   |   9 +++
> >  src/qemu/qemu_driver.c | 198
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 207 insertions(+)
> >
> > diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index
> > 7690339..4895f9f 100644
> > --- a/src/libvirt-domain.c
> > +++ b/src/libvirt-domain.c
> > @@ -11345,6 +11345,15 @@ virConnectGetDomainCapabilities(virConnectPtr
> conn,
> >   *     "cpu.user" - user cpu time spent in nanoseconds as unsigned long long.
> >   *     "cpu.system" - system cpu time spent in nanoseconds as unsigned long
> >   *                    long.
> > + *     "cpu.cache.monitor.count" - tocal cache monitoring groups
> > + *     "cpu.cache.monitor.M.name" - name of cache monitoring group 'M'
> > + *     "cpu.cache.monitor.M.vcpus" - vcpus for cache monitoring group 'M'
> > + *     "cpu.cache.monitor.M.bank.count" - total bank number of cache
> monitoring
> > + *                    group 'M'
> > + *     "cpu.cache.monitor.M.bank.N.id" - OS assigned cache bank id for cache
> > + *                    'N' in cache monitoring group 'M'
> > + *     "cpu.cache.monitor.M.bank.N.bytes" - monitor's cache occupancy of
> cache
> > + *                    bank 'N' in cache monitoring group 'M'
> 
> I'll comment on these in your update...
> 
> >   *
> >   * VIR_DOMAIN_STATS_BALLOON:
> >   *     Return memory balloon device information.
> > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index
> > 89d46ee..d41ae66 100644
> > --- a/src/qemu/qemu_driver.c
> > +++ b/src/qemu/qemu_driver.c
> > @@ -19698,6 +19698,199 @@ typedef enum {  #define HAVE_JOB(flags)
> > ((flags) & QEMU_DOMAIN_STATS_HAVE_JOB)
> >
> >
> > +typedef struct _virQEMUCpuResMonitorData virQEMUCpuResMonitorData;
> > +typedef virQEMUCpuResMonitorData *virQEMUCpuResMonitorDataPtr;
> struct
> > +_virQEMUCpuResMonitorData{
> 
> Data {
> 

Got. One space before '{'
 
> > +    const char *name;
> > +    char *vcpus;
> > +    virResctrlMonitorType tag;
> > +    virResctrlMonitorStatsPtr stats;
> > +    size_t nstats;
> > +};
> > +
> > +
> > +static int
> > +qemuDomainGetCpuResMonitorData(virDomainObjPtr dom,
> > +                               virQEMUCpuResMonitorDataPtr mondata) {
> > +    virDomainResctrlDefPtr resctrl = NULL;
> > +    size_t i = 0;
> > +    size_t j = 0;
> > +    size_t l = 0;
> > +
> > +    for (i = 0; i < dom->def->nresctrls; i++) {
> > +        resctrl = dom->def->resctrls[i];
> > +
> > +        for (j = 0; j < resctrl->nmonitors; j++) {
> > +            virDomainResctrlMonDefPtr domresmon = NULL;
> > +            virResctrlMonitorPtr monitor =
> > + resctrl->monitors[j]->instance;
> > +
> > +            domresmon = resctrl->monitors[j];
> > +            mondata[l].tag = domresmon->tag;
> 
> Why "l" (BTW: 'k' would be preferred because '1' and 'l' are very difficult to
> delineate).

This 'l' and the occurrences in below will be substituted with 'k'.

> 
> > +
> > +            /* If virBitmapFormat successfully returns an vcpu string, then
> > +             * mondata[l].vcpus is assigned with an memory space holding it,
> > +             * let this newly allocated memory buffer to be freed along with
> > +             * the free of 'mondata' */
> > +            if (!(mondata[l].vcpus = virBitmapFormat(domresmon->vcpus)))
> > +                return -1;
> > +
> > +            if (!(mondata[l].name = virResctrlMonitorGetID(monitor))) {
> > +                virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> > +                               _("Could not get monitor ID"));
> > +                return -1;
> > +            }
> > +
> > +            if (domresmon->tag == VIR_RESCTRL_MONITOR_TYPE_CACHE) {
> 
> Something doesn't quite add up with this... Since we're only filling in types with
> 'cache' types and erroring out otherwise ... see [1] data points below...
> 
> > +                if (virResctrlMonitorGetCacheOccupancy(monitor,
> > +                                                       &mondata[l].stats,
> > +                                                       &mondata[l].nstats) < 0)
> > +                    return -1;
> > +            } else {
> > +                virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> > +                               _("Invalid CPU resource type"));
> > +                return -1;
> 
> [1] Perhaps this should be done up front and "avoided" since this helper should
> only care above getting cache stats...

Regarding this error message, it might over-checked since we have made safety check
In building *domresmon->tag.
Will remove the error report lines, since *domresmon->tag could only be 
VIR_RESCTRL_MONITOR_TYPE_CACHE currently.

[R1]:
In my plan this function is to be used for VIR_RESCTRL_MONITOR_TYPE_CACHE and
VIR_RESCTRL_MONITOR_TYPE_MEMBW.

Beside this function, qemuDomainGetStatsCpuResMonitorPerTag (name will be refined)
and qemuDomainGetStatsCpuResMonitor (name will be refined)  are also planned to support
both VIR_RESCTRL_MONITOR_TYPE_CACHE type monitor and 
both VIR_RESCTRL_MONITOR_TYPE_MEMBW monitor even the names are specified with
word 'CpuRes'. 

For me memBW monitor is also in scope of 'CPU'. Here is my understanding:

1. CAT/CMT is technology for cache, obviously it is belong to scope of 'CPU'. 

2. It may  make you confused from the name of MBM, memory bandwidth monitoring, but it get
the memory bandwidth utilization information by tracking L3 cache usage perf CPU thread not by
reading counters of DRAM, so MBM technology is more close to cache than DRAM controller.
This is the reason I think MBM is also a technology in scope of CPU.
With some links for MBM and kernel resctrl for your reference:
https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-monitoring
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Based on above understandings, in naming the three functions that newly introduced I chose name
with word 'cpures' (cpu resource). That I think cache is cpu resource and memBW is cpu resource, the
new functions will handle both resource types. So in this patch my plan is getting cache and memory 
bandwidth  statistics in one function qemuDomainGetStatsCpuResMonitor, the interface connected
to 'domstats' function is written as:

*** One function solution ***
static int
qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
                      virDomainObjPtr dom,
                      virDomainStatsRecordPtr record,
                      int *maxparams,
                      unsigned int privflags ATTRIBUTE_UNUSED)
{
     if (qemuDomainGetStatsCpuCgroup(dom, record, maxparams) < 0)
         return -1;

     /* Get cache and memory bandwidth statistics */                                  <-- One function for both cache and memBW
     if (qemuDomainGetStatsCpuResMonitor(dom, record, maxparams) < 0)
         return -1;

     return 0;
}

Otherwise, if you still think it is not wise to process memBW information and cache in one function,
they have very obvious boundary, then we'd better add two functions and not using word 'cpu resource'/cpures.
Like these:

*** Two functions solution ***
static int
qemuDomainGetStatsCpu(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
                      virDomainObjPtr dom,
                      virDomainStatsRecordPtr record,
                      int *maxparams,
                      unsigned int privflags ATTRIBUTE_UNUSED)
{
     if (qemuDomainGetStatsCpuCgroup(dom, record, maxparams) < 0)
         return -1;

     /* Get cache statistics */                                                                                             <-- function only for cache.
     if (qemuDomainGetStatsCacheMonitor(dom, record, maxparams) < 0)          
         return -1;
}

     /*A new function for getting memory bandwidth statistics */
static int
qemuDomainGetStatsMemStat(virQEMUDriverPtr driver ATTRIBUTE_UNUSED,
                      virDomainObjPtr dom,
                      virDomainStatsRecordPtr record,
                      int *maxparams,
                      unsigned int privflags ATTRIBUTE_UNUSED)
{

 /* Read memBW monitors and output ... 

memorybandwidth.monitor.count=...
memorybandwidth.monitor.0.name=...
memorybandwidth.monitor.0.vcpus=...
...
*/
     return 0;
}

How do you think? Which one do you prefer, one function interface or two functions interface? 
(function names might be refined.)

> 
> IOW:
> 
>        for (j = 0; j < resctrl->nmonitors; j++) {
>            virDomainResctrlMonDefPtr domresmon = NULL;
>            virResctrlMonitorPtr monitor = resctrl->monitors[j]->instance;
> 
>            domresmon = resctrl->monitors[j];
> 
>            if (domresmon->tag != VIR_RESCTRL_MONITOR_TYPE_CACHE)
>                continue;
> 
> This this loop only fetches cache information and then the 'l' (or preferably 'k')
> makes more sense

Maybe before memBW is introduced, it might be better to make code more clear just
as you wrote, for memBW then we make changes.

But I still need your opinion on using one function interface or interface of two separate functions
for cache and memory statistics.

Now we will add output of 'domstats' something like:
cpu.cache.monitor.count = ...
cpu.cache.monitor.0.name=...
cpu.cache.monitor.0.vcpus=...
...

Is following arrangement for memBW monitor is not acceptable? 

cpu.memorybandwidth.monitor.count=...
cpu.memorybandwidth.monitor.0.name=...
cpu.memorybandwidth.monitor.0.vcpus=...
...


> 
> > +            }
> > +
> > +            l++;
> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +
> 
> Might be nice to add comments...
> 
> > +static int
> > +qemuDomainGetStatsCpuResMonitorPerTag(virQEMUCpuResMonitorDataPtr
> mondata,
> > +                                      size_t nmondata,
> > +                                      virResctrlMonitorType tag,
> > +                                      virDomainStatsRecordPtr record,
> > +                                      int *maxparams) {
> > +    char param_name[VIR_TYPED_PARAM_FIELD_LENGTH];
> > +    unsigned int nmonitors = 0;
> > +    const char *resname = NULL;
> > +    const char *resnodename = NULL;
> > +    size_t i = 0;
> > +
> > +    for (i = 0; i < nmondata; i++) {
> > +        if (mondata[i].tag == tag)
> > +            nmonitors++;
> > +    }
> > +
> > +    if (!nmonitors)
> > +        return 0;
> 
> I'd place the above two below the following hunk - perf wise and logically since
> @tag is the important piece here.  However, it may not be important to
> compare the [i].tag == tag as long as you've done your collection of *only* one
> type.  Hope this makes sense!
> 
> Thus your code is just:
> 
>     if (nmondata == 0)
>         return 0;
> 

Yes. At least currently no need to add up *nmoitors again.

> > +
> > +    if (tag == VIR_RESCTRL_MONITOR_TYPE_CACHE) {
> > +        resname = "cache";
> > +        resnodename = "bank";
> > +    } else if (tag == VIR_RESCTRL_MONITOR_TYPE_MEMBW) {
> > +        resname = "memBW";
> > +        resnodename = "node";
> 
> [1] In the current scheme of qemuDomainGetStatsCpuResMonitor, how could
> we get this tag?

Another tag, the VIR_RESCTRL_MONITOR_TYPE_MEMBW, will be added to 
qemuDomainGetStatsCpuResMonitor if memBW monitor is introduced in my
plan.

But I know you are challenging this plan. I'll make change according to your
suggestion. 

> 
> If your goal was to make a "generic" printing API to be used by both cpustats
> and memstats (eventually), then perhaps the name of the helper should just be
> qemuDomainGetStatsResMonitor (or *MonitorParams). Since @tag is a
> parameter and it's fairly easy to see it's use and the formatting of the params is
> based purely on the tag in the generically output data.
> 
> The helper should also be appropriately placed in qemu_driver.c such that when
> memBW stats support is added it can just be used and doesn't need to be moved.
> 

As I stated in [R1], I look memstats as one kind of CPU resources. 

If we chose the two function scheme, things will be considered as you stated.


> > +    } else {
> > +        return 0;
> > +    }
> > +
> > +    snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +             "cpu.%s.monitor.count", resname);
> > +    if (virTypedParamsAddUInt(&record->params, &record->nparams,
> > +                              maxparams, param_name, nmonitors) < 0)
> > +        return -1;
> > +
> > +    for (i = 0; i < nmonitors; i++) {
> > +        size_t l = 0;
> 
> Similar 'l' concerns here use 'j' instead

'l' will be modified to 'j'.

> 
> > +
> > +        snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +                 "cpu.%s.monitor.%zd.name", resname, i);
> > +        if (virTypedParamsAddString(&record->params,
> > +                                    &record->nparams,
> > +                                    maxparams,
> > +                                    param_name,
> > +                                    mondata[i].name) < 0)
> > +            return -1;
> > +
> > +        snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +                 "cpu.%s.monitor.%zd.vcpus", resname, i);
> > +        if (virTypedParamsAddString(&record->params, &record->nparams,
> > +                                    maxparams, param_name,
> > +                                    mondata[i].vcpus) < 0)
> > +            return -1;
> > +
> > +        snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +                 "cpu.%s.monitor.%zd.%s.count", resname, i, resnodename);
> > +        if (virTypedParamsAddUInt(&record->params, &record->nparams,
> > +                                  maxparams, param_name,
> > +                                  mondata[i].nstats) < 0)
> > +            return -1;
> > +
> > +        for (l = 0; l < mondata[i].nstats; l++) {
> > +            snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +                     "cpu.%s.monitor.%zd.%s.%zd.id",
> > +                     resname, i, resnodename, l);
> > +            if (virTypedParamsAddUInt(&record->params, &record->nparams,
> > +                                      maxparams, param_name,
> > +                                      mondata[i].stats[l].id) < 0)
> > +                return -1;
> > +
> > +            snprintf(param_name, VIR_TYPED_PARAM_FIELD_LENGTH,
> > +                     "cpu.%s.monitor.%zd.%s.%zd.bytes",
> > +                     resname, i, resnodename, l);
> > +            if (virTypedParamsAddUInt(&record->params, &record->nparams,
> > +                                      maxparams, param_name,
> > +                                      mondata[i].stats[l].val) < 0)
> > +                return -1;
> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +
> > +static int
> > +qemuDomainGetStatsCpuResMonitor(virDomainObjPtr dom,
> > +                                virDomainStatsRecordPtr record,
> > +                                int *maxparams) {
> > +    virDomainResctrlDefPtr resctrl = NULL;
> > +    virQEMUCpuResMonitorDataPtr mondata = NULL;
> > +    unsigned int nmonitors = 0;
> > +    size_t i = 0;
> > +    int ret = -1;
> > +
> > +    if (!virDomainObjIsActive(dom))
> > +        return 0;
> > +
> > +    for (i = 0; i < dom->def->nresctrls; i++) {
> > +        resctrl = dom->def->resctrls[i];
> > +        nmonitors += resctrl->nmonitors;
> 
> [1] To further the above conversation, @nmonitors won't be limited by cache
> stats only, but the collection of the @mondata is. So I think here nmonitors
> should only include tags w/ *TYPE_CACHE.
> 

> > +    }
> > +
> > +    if (!nmonitors)
> > +        return 0;
> > +
> > +    if (VIR_ALLOC_N(mondata, nmonitors) < 0)
> > +        return -1;
> > +
> > +    if (qemuDomainGetCpuResMonitorData(dom, mondata) < 0)
> 
> You could pass @nmonitors here and use as 'nmondata' in the above function to
> "ensure" that the nmonitors looping and filling of mondata doesn't go beyond
> bounds - although that shouldn't happen, so it's not a requirement as long as the
> algorithm to calculate @nmonitors and allocate @mondata is the same
> algorithm used to fill in @mondata.
> 

qemuDomainGetCpuResMonitorData gets all monitor statistics and stored in mondata, in my
design not only for cache statistics.
But since we only have cache monitor currently I'll do as you suggested. 


> > +        goto cleanup;
> > +
> > +    for (i = VIR_RESCTRL_MONITOR_TYPE_UNSUPPORT + 1;
> > +         i < VIR_RESCTRL_MONITOR_TYPE_LAST; i++) {
> > +        if (qemuDomainGetStatsCpuResMonitorPerTag(mondata, nmonitors, i,
> > +                                                  record, maxparams) < 0)
> > +            goto cleanup;
> > +    }
> 
> Similar comment here - this is only being called from qemuDomainGetStatsCpu
> thus it should only pass VIR_RESCTRL_MONITOR_TYPE_CACHE and not be run in
> a loop. If eventually memBW data was filled in - it would be printed next to cpu-
> stats data and that doesn't necessarily make sense, does it?
>

Only for cache, OK. 
 
> 
> > +
> > +    ret = 0;
> > + cleanup:
> > +    for (i = 0; i < nmonitors; i++)
> > +        VIR_FREE(mondata[i].vcpus);
> > +    VIR_FREE(mondata);
> > +
> > +    return ret;
> > +}
> > +
> > +
> >  static int
> >  qemuDomainGetStatsCpuCgroup(virDomainObjPtr dom,
> >                              virDomainStatsRecordPtr record, @@
> > -19747,6 +19940,11 @@ qemuDomainGetStatsCpu(virQEMUDriverPtr driver
> > ATTRIBUTE_UNUSED,  {
> >      if (qemuDomainGetStatsCpuCgroup(dom, record, maxparams) < 0)
> >          return -1;
> > +
> > +    if (qemuDomainGetStatsCpuResMonitor(dom, record, maxparams) < 0)
> > +        return -1;
> > +
> > +    return 0;
> 
> This obviously would have some differences based on my comments from
> patch15.
> 

Got. Still remember what changes you made. 

> Rather than have you post patches 1-15 again just to fix 16, I'll push
> 1-15 and let you work on 16 and 17.  We still have time before the next release.
> 

I am looking forward for your attitude toward whether I could regard 'memBW monitor'(MBM)
as a kind of CPU resource in libvirt and report the memory bandwidth statistics by command
' virsh domstats --cpu-total'?

I still think MBM might have big difference in comparing with DRAM memory bandwidth, because
the cache hierarchy has been significantly changed since CPU platform Skylake-SP, in Skyalke-SP
not all data to CPU from DRAM is through L3 cache. Data might be read to L2 cache directly from
DRAM. 
I am looking for answers regard the difference of MBM observed bandwidth and DRAM bandwidth
from internal team. I will make update once get answers.

> Besides once I push, we'll find out fairly quickly whether some other arch has
> build/compile problems!
> 
> John
> 

Thanks for review.
Huaqiang

> >  }
> >
> >
> >


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]