[libvirt] [PATCH v4 1/2] Fix nodeinfo output on PPC64 KVM hosts
Martin Kletzander
mkletzan at redhat.com
Tue Jul 7 12:51:51 UTC 2015
On Tue, Jul 07, 2015 at 09:25:59AM +0200, Andrea Bolognani wrote:
>From: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>
>The nodeinfo is reporting incorrect number of cpus and incorrect host
>topology on PPC64 KVM hosts. The KVM hypervisor on PPC64 needs only
>the primary thread in a core to be online, and the secondaries offlined.
>While scheduling a guest in, the kvm scheduler wakes up the secondaries to
>run in guest context.
>
>The host scheduling of the guests happen at the core level(as only primary
>thread is online). The kvm scheduler exploits as many threads of the core
>as needed by guest. Further, starting POWER8, the processor allows splitting
>a physical core into multiple subcores with 2 or 4 threads each. Again, only
>the primary thread in a subcore is online in the host. The KVM-PPC
>scheduler allows guests to exploit all the offline threads in the subcore,
>by bringing them online when needed.
>(Kernel patches on split-core http://www.spinics.net/lists/kvm-ppc/msg09121.html)
>
>Recently with dynamic micro-threading changes in ppc-kvm, makes sure
>to utilize all the offline cpus across guests, and across guests with
>different cpu topologies.
>(https://www.mail-archive.com/kvm@vger.kernel.org/msg115978.html)
>
>Since the offline cpus are brought online in the guest context, it is safe
>to count them as online. Nodeinfo today discounts these offline cpus from
>cpu count/topology calclulation, and the nodeinfo output is not of any help
>and the host appears overcommited when it is actually not.
>
>The patch carefully counts those offline threads whose primary threads are
>online. The host topology displayed by the nodeinfo is also fixed when the
>host is in valid kvm state.
>
>Signed-off-by: Shivaprasad G Bhat <sbhat at linux.vnet.ibm.com>
>Signed-off-by: Andrea Bolognani <abologna at redhat.com>
>---
> src/libvirt_private.syms | 1 +
> src/nodeinfo.c | 138 ++++++++++++++++++++++++++++++++++++++++++-----
> src/nodeinfo.h | 1 +
> 3 files changed, 127 insertions(+), 13 deletions(-)
>
>diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
>index 1566d11..64644a2 100644
>--- a/src/libvirt_private.syms
>+++ b/src/libvirt_private.syms
>@@ -1008,6 +1008,7 @@ nodeGetInfo;
> nodeGetMemory;
> nodeGetMemoryParameters;
> nodeGetMemoryStats;
>+nodeGetThreadsPerSubcore;
> nodeSetMemoryParameters;
>
>
>diff --git a/src/nodeinfo.c b/src/nodeinfo.c
>index 2fafe2d..0b78d7d 100644
>--- a/src/nodeinfo.c
>+++ b/src/nodeinfo.c
>@@ -32,6 +32,12 @@
> #include <sys/utsname.h>
> #include <sched.h>
> #include "conf/domain_conf.h"
>+#include <fcntl.h>
>+#include <sys/ioctl.h>
>+
>+#if HAVE_LINUX_KVM_H
>+# include <linux/kvm.h>
>+#endif
>
> #if defined(__FreeBSD__) || defined(__APPLE__)
> # include <sys/time.h>
>@@ -428,28 +434,86 @@ virNodeParseNode(const char *node,
> unsigned int cpu;
> int online;
> int direrr;
>+ int lastonline;
>+ virBitmapPtr cpu_map = NULL;
>+ int threads_per_subcore = 0;
>
> *threads = 0;
> *cores = 0;
> *sockets = 0;
>
>+ /* PPC-KVM needs the secondary threads of a core to be offline on the
>+ * host. The kvm scheduler brings the secondary threads online in the
>+ * guest context. Moreover, P8 processor has split-core capability
>+ * where, there can be 1,2 or 4 subcores per core. The primaries of the
>+ * subcores alone will be online on the host for a subcore in the
>+ * host. Even though the actual threads per core for P8 processor is 8,
>+ * depending on the subcores_per_core = 1, 2 or 4, the threads per
>+ * subcore will vary accordingly to 8, 4 and 2 repectively.
>+ * So, On host threads_per_core what is arrived at from sysfs in the
>+ * current logic is actually the subcores_per_core. Threads per subcore
>+ * can only be obtained from the kvm device. For example, on P8 wih 1
>+ * core having 8 threads, sub_cores_percore=4, the threads 0,2,4 & 6
>+ * will be online. The sysfs reflects this and in the current logic
>+ * variable 'threads' will be 4 which is nothing but subcores_per_core.
>+ * If the user tampers the cpu online/offline states using chcpu or other
>+ * means, then it is an unsupported configuration for kvm.
>+ * The code below tries to keep in mind
>+ * - when the libvirtd is run inside a KVM guest or Phyp based guest.
>+ * - Or on the kvm host where user manually tampers the cpu states to
>+ * offline/online randomly.
>+ * On hosts other than POWER this will be 0, in which case a simpler
>+ * thread-counting logic will be used */
>+ if ((threads_per_subcore = nodeGetThreadsPerSubcore(arch)) < 0)
>+ goto cleanup;
>+
>+ /* Keep track of node CPUs in a bitmap so that we can iterate
>+ * through them in guaranteed numeric order, which is required to
>+ * find out whether a thread is primary or secondary */
>+ if ((cpu_map = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) == NULL)
>+ goto cleanup;
>+
> if (!(cpudir = opendir(node))) {
> virReportSystemError(errno, _("cannot opendir %s"), node);
> goto cleanup;
> }
>
>- /* enumerate sockets in the node */
>- CPU_ZERO(&sock_map);
> while ((direrr = virDirRead(cpudir, &cpudirent, node)) > 0) {
> if (sscanf(cpudirent->d_name, "cpu%u", &cpu) != 1)
> continue;
>
>+ if (virBitmapSetBit(cpu_map, cpu) < 0)
>+ goto cleanup;
>+ }
>+
>+ if (direrr < 0)
>+ goto cleanup;
>+
>+ /* enumerate sockets in the node */
>+ CPU_ZERO(&sock_map);
>+ lastonline = -1;
>+ for (cpu = 0; cpu < virBitmapSize(cpu_map); cpu++) {
>+ if (!virBitmapIsBitSet(cpu_map, cpu))
>+ continue;
>+
> if ((online = virNodeGetCpuValue(node, cpu, "online", 1)) < 0)
> goto cleanup;
>
> if (!online)
> continue;
>
>+ /* If subcores are in use, only the primary thread in each subcore
>+ * can be online. If a secondary thread is found online, it means
>+ * the configuration has been tampered with by the user and we can't
>+ * rely on our logic to count threads properly */
>+ if (threads_per_subcore > 0 &&
>+ lastonline >= 0 &&
>+ (cpu - lastonline) < threads_per_subcore) {
I think you should also check for out-of-order offline thread not just
online ones, so if you move it above the check for whether this one
ins online, and change the condition to this:
if (threads_per_subcore > 0 &&
lastonline >= 0 &&
(online == ((cpu - lastonline) != threads_per_subcore))
ACK with that changed (if you agree, of course).
>+ /* Fallback to the subcore-unaware logic */
>+ threads_per_subcore = 0;
>+ }
>+ lastonline = cpu;
>+
> /* Parse socket */
> if ((sock = virNodeParseSocket(node, arch, cpu)) < 0)
> goto cleanup;
>@@ -459,9 +523,6 @@ virNodeParseNode(const char *node,
> sock_max = sock;
> }
>
>- if (direrr < 0)
>- goto cleanup;
>-
> sock_max++;
>
> /* allocate cpu maps for each socket */
>@@ -471,21 +532,31 @@ virNodeParseNode(const char *node,
> for (i = 0; i < sock_max; i++)
> CPU_ZERO(&core_maps[i]);
>
>- /* iterate over all CPU's in the node */
>- rewinddir(cpudir);
>- while ((direrr = virDirRead(cpudir, &cpudirent, node)) > 0) {
>- if (sscanf(cpudirent->d_name, "cpu%u", &cpu) != 1)
>+ /* Iterate over all CPUs in the node */
>+ lastonline = -1;
>+ for (cpu = 0; cpu < virBitmapSize(cpu_map); cpu++) {
>+ if (!virBitmapIsBitSet(cpu_map, cpu))
> continue;
>
> if ((online = virNodeGetCpuValue(node, cpu, "online", 1)) < 0)
> goto cleanup;
>
> if (!online) {
>- (*offline)++;
>+ if (threads_per_subcore > 0 &&
>+ lastonline >= 0 &&
>+ (cpu-lastonline) < threads_per_subcore) {
>+ /* Secondary offline threads are counted as online when
>+ * subcores are in use */
>+ processors++;
>+ } else {
>+ /* But they are counted as offline otherwise */
>+ (*offline)++;
>+ }
> continue;
> }
>
> processors++;
>+ lastonline = cpu;
>
> /* Parse socket */
> if ((sock = virNodeParseSocket(node, arch, cpu)) < 0)
>@@ -513,9 +584,6 @@ virNodeParseNode(const char *node,
> *threads = siblings;
> }
>
>- if (direrr < 0)
>- goto cleanup;
>-
> /* finalize the returned data */
> *sockets = CPU_COUNT(&sock_map);
>
>@@ -528,6 +596,12 @@ virNodeParseNode(const char *node,
> *cores = core;
> }
>
>+ if (threads_per_subcore > 0) {
>+ /* The thread count ignores offline threads, which means that only
>+ * only primary threads have been considered so far. If subcores
>+ * are in use, we need to also account for secondary threads */
>+ *threads *= threads_per_subcore;
>+ }
> ret = processors;
>
> cleanup:
>@@ -537,6 +611,7 @@ virNodeParseNode(const char *node,
> ret = -1;
> }
> VIR_FREE(core_maps);
>+ virBitmapFree(cpu_map);
>
> return ret;
> }
>@@ -2116,3 +2191,40 @@ nodeAllocPages(unsigned int npages,
> cleanup:
> return ret;
> }
>+
>+/* Get the number of threads per subcore.
>+ *
>+ * This will be 2, 4 or 8 on POWER hosts, depending on the current
>+ * micro-threading configuration, and 0 everywhere else.
>+ *
>+ * Returns the number of threads per subcore if subcores are in use, zero
>+ * if subcores are not in use, and a negative value on error */
>+int
>+nodeGetThreadsPerSubcore(virArch arch)
>+{
>+ int kvmfd;
>+ int threads_per_subcore = 0;
>+
>+#if HAVE_LINUX_KVM_H && defined(KVM_CAP_PPC_SMT)
>+ if (ARCH_IS_PPC64(arch)) {
>+
>+ kvmfd = open("/dev/kvm", O_RDONLY);
>+ if (kvmfd < 0) {
>+ threads_per_subcore = -1;
>+ goto out;
>+ }
>+
>+ /* For Phyp and KVM based guests the ioctl for KVM_CAP_PPC_SMT
>+ * returns zero and both primary and secondary threads will be
>+ * online */
>+ threads_per_subcore = ioctl(kvmfd,
>+ KVM_CHECK_EXTENSION,
>+ KVM_CAP_PPC_SMT);
>+
>+ out:
>+ VIR_FORCE_CLOSE(kvmfd);
>+ }
>+#endif /* HAVE_LINUX_KVM_H && defined(KVM_CAP_PPC_SMT) */
>+
>+ return threads_per_subcore;
>+}
>diff --git a/src/nodeinfo.h b/src/nodeinfo.h
>index 047bd5c..41739a1 100644
>--- a/src/nodeinfo.h
>+++ b/src/nodeinfo.h
>@@ -46,6 +46,7 @@ int nodeGetMemory(unsigned long long *mem,
> virBitmapPtr nodeGetPresentCPUBitmap(void);
> virBitmapPtr nodeGetCPUBitmap(int *max_id);
> int nodeGetCPUCount(void);
>+int nodeGetThreadsPerSubcore(virArch arch);
>
> int nodeGetMemoryParameters(virTypedParameterPtr params,
> int *nparams,
>--
>2.4.3
>
>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20150707/6609ec21/attachment-0001.sig>
More information about the libvir-list
mailing list