[libvirt] CFS Hardlimits and the libvirt cgroups implementation

Wen Congyang wency at cn.fujitsu.com
Fri Jun 10 06:59:06 UTC 2011


At 06/09/2011 03:20 AM, Adam Litke Write:
> Hi all.  In this post I would like to bring up 3 issues which are
> tightly related: 1. unwanted behavior when using cfs hardlimits with
> libvirt, 2. Scaling cputune.share according to the number of vcpus, 3.
> API proposal for CFS hardlimits support.
> 
> 
> === 1 ===
> Mark Peloquin (on cc:) has been looking at implementing CFS hard limit
> support on top of the existing libvirt cgroups implementation and he has
> run into some unwanted behavior when enabling quotas that seems to be
> affected by the cgroup hierarchy being used by libvirt.
> 
> Here are Mark's words on the subject (posted by me while Mark joins this
> mailing list):
> ------------------
> I've conducted a number of measurements using CFS.
> 
> The system config is a 2 socket Nehalem system with 64GB ram. Installed
> is RHEL6.1-snap4. The guest VMs being used have RHEL5.5 - 32bit. I've
> replaced the kernel with 2.6.39-rc6+ with patches from
> Paul-V6-upstream-breakout.tar.bz2 for CFS bandwidth. The test config
> uses 5 VMs of various vcpu and memory sizes. Being used are 2 VMs with 2
> vcpus and 4GB of memory, 1 VM with 4vcpus/8GB, another VM with
> 8vcpus/16GB and finally a VM with 16vcpus/16GB.
> 
> Thus far the tests have been limited to cpu intensive workloads. Each VM
> runs a single instance of the workload. The workload is configured to
> create one thread for each vcpu in the VM. The workload is then capable
> of completely saturation each vcpu in each VM.
> 
> CFS was tested using two different topologies.
> 
> First vcpu cgroups were created under each VM created by libvirt. The
> vcpu threads from the VM's cgroup/tasks were moved to the tasks list of
> each vcpu cgroup, one thread to each vcpu cgroup. This tree structure
> permits setting CFS quota and period per vcpu. Default values for
> cpu.shares (1024), quota (-1) and period (500000us) was used in each VM
> cgroup and inherited by the vcpu croup. With these settings the workload
> generated system cpu utilization (measured in the host) of >99% guest,
>> 0.1 idle, 0.14% user and 0.38 system.
> 
> Second, using the same topology, the CFS quota in each vcpu's cgroup was
> set to 250000us allowing each vcpu to consume 50% of a cpu. The cpu
> workloads was run again. This time the total system cpu utilization was
> measured at 75% guest, ~24% idle, 0.15% user and 0.40% system.
> 
> The topology was changed such that a cgroup for each vcpu was created in
> /cgroup/cpu.
> 
> The first test used the default/inherited shares and CFS quota and
> period. The measured system cpu utilization was >99% guest, ~0.5 idle,
> 0.13 user and 0.38 system, similar to the default settings using vcpu
> cgroups under libvirt.
> 
> The next test, like before the topology change, set the vcpu quota
> values to 250000us or 50% of a cpu. In this case the measured system cpu
> utilization was ~92% guest, ~7.5% idle, 0.15% user and 0.38% system.
> 
> We can see that moving the vcpu cgroups from being under libvirt/qemu
> make a big difference in idle cpu time.
> 
> Does this suggest a possible problems with libvirt?

I do not think it is a problem in libvirt.
Libvirt only uses the interface provided by cgroup system. It may a problem
in cgroup or CFS bandwidth.

> ------------------
> 
> Has anyone else seen this type of behavior when using cgroups with CFS
> hardlimits?  We are working with the kernel community to see if there
> might be a bug in cgroups itself.
> 
> 
> === 2 ===
> Something else we are seeing is that libvirt's default setting for
> cputune.share is 1024 for any domain (regardless of how many vcpus are
> configured.  This ends up hindering performance of really large VMs
> (with lots of vcpus) as compared to smaller ones since all domains are
> given equal share.  Would folks consider changing the default for
> 'shares' to be a quantity scaled by the number of vcpus such that bigger
> domains get to use proportionally more host cpu resource?

The value 1024 is a default value in kernel, not libvirt.
If you want to change cputune.share, you should edit the xml config file.

> 
> 
> === 3 ===
> Besides the above issues, I would like to open a discussion on what the
> libvirt API for enabling cpu hardlimits should look like.  Here is what
> I was thinking:

I need this feature immediately after CFS bandwidth patchset is merged into
upsteam kernel. So I am working on this recently.

> 
> Two additional scheduler parameters (based on the names given in the
> cgroup fs) will be recognized for qemu domains: 'cfs_period' and
> 'cfs_quota'.  These can use the existing
> virDomain[Get|Set]SchedulerParameters() API.  The Domain XML schema
> would be updated to permit the following:
> 
> --- snip ---
> <cputune>
>   ...
>   <cfs_period>1000000</cfs_period>
>   <cfs_quota>500000</cfs_quota>
> </cputune>
> --- snip ---
> 
> To actuate these configuration settings, we simply apply the values to
> the appropriate cgroup(s) for the domain.  We would prefer that each
> vcpu be in its own cgroup to ensure equal and fair scheduling across all
> vcpus running on the system.  (We will need to resolve the issues
> described by Mark in order to figure out where to hang these cgroups).

each vcpu in its own cgroup?
Do you mean each vcpu has a seperate thread?

AFAIK, qemu does not create thread for each vcpu.

Thanks.
Wen Congyang

> 
> 
> 
> Thanks for sticking with me through this long email.  I greatly
> appreciate your thoughts and comments on these topics.
> 




More information about the libvir-list mailing list