Ready for more Fast Packets?!
In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.
Buckle in and join the fast lane of packet processing!
Getting into the specifics
It's important to understand the components we'll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).
The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.
So, let's do it ...
SystemD CPUAffinity
The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they

might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.
So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).
IRQBALANCE_BANNED_CPUS
The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).
Tickless Kernel
Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.
Adaptive-Ticks CPUs
Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be "adaptive-ticks CPUs." This is important for applications with aggressive real-time response constraints because it allows them to improve their

worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.
RCU Callbacks Offload
The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.
Fixed CPU frequency scaling
The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.
nosoftlockup

The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.
Dirty pages affinity
Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.
Execute workqueue requests
Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.
Disable Machine Check Exception
Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer's central processing unit detects a hardware problem.
KVM Low Latency
Both standard KVM modules and Intel KVM modules support a number of options to

reduce latency by removing unwanted VM Exit and interrupts.
Pause Loop Exiting
In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”
Periodic Kvmclock Sync
In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.
SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:
Parameter | Value | Details |
kernel.hung_task_timeout_secs | 600 | Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile. |
kernel.nmi_watchdog | 0 | Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile. |
vm.stat_interval | 10 | Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile. |
kernel.timer_migration | 1 | In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels - https://bugzilla.redhat.com/show_bug.cgi?id=1408308 - will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile. |
kernel.numa_balancing | 0 | Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile. |
Disable Transparent Hugepages
Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).
Tuned parameters
The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.
- force_latency = 1
- governor = performance
- energy_perf_bias = performance
- min_perf_pct = 100
Speeding to a conclusion
As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.
Next time the series finishes with an example of how this all comes together!
Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don't fail at scale: How to plan, build, and operate a successful OpenStack cloud today!
Sull'autore
Federico Iezzi is an open-source evangelist who has witnessed the Telco NFV transformation. Over his career, Iezzi achieved a number of international firsts in the public cloud space and also has about a decade of experience with OpenStack. He has been following the Telco NFV transformation since 2014. At Red Hat, Federico is member of the EMEA Telco practice as a Principle Architect.
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit