Ready for more Fast Packets?!
In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.
Buckle in and join the fast lane of packet processing!
Getting into the specifics
It's important to understand the components we'll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).
The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.
So, let's do it ...
SystemD CPUAffinity
The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they
might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.
So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).
IRQBALANCE_BANNED_CPUS
The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).
Tickless Kernel
Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.
Adaptive-Ticks CPUs
Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be "adaptive-ticks CPUs." This is important for applications with aggressive real-time response constraints because it allows them to improve their
worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
RCU Callbacks Offload
The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.
Fixed CPU frequency scaling
The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.
nosoftlockup
The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.
Dirty pages affinity
Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.
Execute workqueue requests
Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.
Disable Machine Check Exception
Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer's central processing unit detects a hardware problem.
KVM Low Latency
Both standard KVM modules and Intel KVM modules support a number of options to
reduce latency by removing unwanted VM Exit and interrupts.
Pause Loop Exiting
In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”
Periodic Kvmclock Sync
In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.
SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:
Parameter | Value | Details |
kernel.hung_task_timeout_secs | 600 | Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile. |
kernel.nmi_watchdog | 0 | Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile. |
vm.stat_interval | 10 | Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile. |
kernel.timer_migration | 1 | In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels - https://bugzilla.redhat.com/show_bug.cgi?id=1408308 - will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile. |
kernel.numa_balancing | 0 | Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile. |
Disable Transparent Hugepages
Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).
Tuned parameters
The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.
- force_latency = 1
- governor = performance
- energy_perf_bias = performance
- min_perf_pct = 100
Speeding to a conclusion
As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.
Next time the series finishes with an example of how this all comes together!
Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don't fail at scale: How to plan, build, and operate a successful OpenStack cloud today!
À propos de l'auteur
Federico Iezzi is an open-source evangelist who has witnessed the Telco NFV transformation. Over his career, Iezzi achieved a number of international firsts in the public cloud space and also has about a decade of experience with OpenStack. He has been following the Telco NFV transformation since 2014. At Red Hat, Federico is member of the EMEA Telco practice as a Principle Architect.
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit