As computing demands have risen, networking demands have risen too. Unfortunately, the engine of computers, the central processing unit (CPU), remains a bottleneck for performance of many applications today. Traditionally, your options to work around this problem were to super-size a computing machine with more powerful hardware, distribute the software between multiple computers, or make the software more efficient.
Today we will look at how adding more powerful hardware to handle specific tasks in your computing infrastructure can alleviate the demands on the CPU. This is generally referred to as hardware offloading.
What is hardware offloading?
When certain tasks are offloaded from the CPU to a specialized hardware resource, the CPU does not need to process these instructions and has a reduced workload. These specialized hardware resources provide additional computing power that can enable functionality that would have been deemed not valuable enough for the computational load.
Offloading, or computational offloading, transfers resource-intensive computational tasks to a separate processor, also known as a coprocessor. This separate processor might be a hardware accelerator situated inside the server, but it could also be an external platform. In short, offloading or acceleration is the process by which computing tasks are offloaded onto computer hardware or systems that have been specially designed to perform particular functions more efficiently than is possible in software running on a general-purpose CPU.
While today’s CPUs are multi-core and often feature parallel single-instruction multiple data (SIMD) units, hardware acceleration still yields benefits, particularly for any computation-intensive algorithm, which is executed frequently in a task or program.
In terms of networking, hardware offloading moves the processing of network tasks from the CPU to the network interface card (NIC). This frees up CPU cycles and eliminates system bottlenecks like the Peripheral Component Interconnect (PCI) bus and offers the potential benefits of improving throughput, efficiency, and latency.
So, how does network hardware offloading work? NICs are specialized to handle the load of processing network tasks better than CPUs, and just like CPUs, not all NICs are the same.
In the case of networking, tasks are offloaded to an Network Processing Unit (NPU), which is a network processor, on a network interface controller (NIC). The latest generation of NICs that are empowered to consume the load from modern CPUs are now referred to as Smart NICs. These Smart NICs free up CPUs (and their queues) to handle larger workloads, or they can be replaced with less expensive CPUs to handle the same load. Additionally, Smart NICs provide enhanced network, storage, and security functionality not offloadable with traditional NICs, providing an added incentive to use the offloading hardware.
If you spend too much time looking into system or networking performance, Smart NICs are something you should definitely be looking deeper into. And, if you are looking to refresh the architectures of your systems and continue to improve your security posture, then this post may be for you.
Is this new?
There is a long history of pushing everything by default to the CPUs as well as a history of offloading CPU intensive processes to specialized hardware. Graphics processing units (GPUs) for computer graphics and sound cards or sound card mixers for computer sound processing are common forms of hardware acceleration for both home and business use.
Similarly, digital signal processors, which are specialized microprocessor chips that have been architected to optimize the needs of digital signal processing, are used in numerous technologies including telecommunications, radar, sonar, and consumer electronic devices such as mobile phones and high-definition televisions (HDTVs). With advances in technology and demands of the marketplace, Artificial Intelligence (AI) accelerators, cryptographic accelerators and secure crypto processors are becoming more common. In particular, Transport Layer Security (TLS) acceleration (formerly known as Security Sockets Layer (SSL) acceleration) offloads processor-intensive public-key encryption activities (most frequently the handshake process) for TLS and SSL to a hardware accelerator.
Similar to hardware acceleration, computing can be offloaded to external platforms over a network to provide computing power and overcome the hardware limitations of a device, such as limited computational power, storage, and power.
The explosive growth in the need for mathematical computation, AI, and cryptographic services coupled with the rapid approach of the end of Moore’s law and Dennard scaling means that incremental gains in semiconductor performance are harder and more costly to achieve.
The migration to cloud based computing has led to cloud providers moving this overhead from CPUs to SmartNICs as they now need to perform networking operations that they did not have to before. Moving this networking overhead to SmartNICs made logical and financial sense—it does not make sense to charge for this non-value add service and competition continues to drive down prices. This makes offloading, as well as the need for enhanced networking capabilities, more important than ever for CPU and network performance.
With network hardware offloading, CPU resources and internal bus bandwidth and queues are spared the overhead of handling the networking traffic. This leads to general performance improvements, especially during peak loads. Additionally, network hardware offloading provides gains in network performance because packet switching is faster in hardware than software. Not only is hardware more optimized for superior performance, but the packet pathway inside of the machine is reduced when the processing is done in the networking interface card (NIC) of the computer. This is done without needing to be transmitted through various buses and queues inside and back out the NIC.
Network hardware offloading can provide enhanced networking capabilities, including security capabilities. With the additional network processing capabilities at the computer’s edge in the NIC, it is now more possible than ever to leverage network security functions within the NIC, such as connection tracking, Access Control Lists (ACLs), Network Address Translations (NAT), firewalls, port mirroring, or other custom functionality to support or improve networking performance or improve the functionality of your applications.
While overall networking performance for a system is the primary reason why hardware offloading should be considered and should provide a simple approach to calculating the return on investment (RoI), the icing on the networking cake is the list of new enabled features on the latest generation of intelligent NICs. While all of these capabilities were always possible to run on the CPU, the CPU has been considered too valuable to use for most of these capabilities since they are secondary to running their primary workloads.
Two companies that are industry-leaders in the area of intelligent NICs and secure network hardware offloading are Mellanox (now part of NVIDIA) and Netronome with their respective ConnectX and Agilio SmartNIC solutions. Red Hat continues to collaborate with NVIDIA and others to upstream code and bring additional capabilities to hardware offloading in Linux.
While organizations have sought doing everything in software, sometimes contrarian thinking makes sense for particular applications. Not all workloads are created equal and neither are the economics behind these workloads or the cost for the network or security supporting those workloads.
Hardware offloading to modern NICs is an approach that can yield improvements in throughput, efficiency, and latency, in addition to enabling the cost effective implementation of various security functionalities. Collaborate with your team, peers, the network team, your infosecurity team, and related business stakeholders and challenge them to look at the larger picture of application workloads, computing infrastructure (datacenter and cloud), networking, and security to consider all the issues first.
Then, consider the various solutions and their economic value and work together to understand which architecture or solution would best meet your business needs. To find the right solution, remember to keep your mind open to various options and keep your end goal in mind while considering the requirements that you need to meet.
If you are interested in leveraging the full support of hardware offloading on Red Hat Enterprise Linux (RHEL), it is available starting in RHEL 8.2 as part of Fast Datapath.
About the authors
Aaron Conole is a software developer with more than 10 years of experience. He has worked on the development of linux device drivers and embedded applications. He joined Red Hat in 2015.
Marcelo Leitner is a Principal Software Engineer at Red Hat. He joined the company on the Customer Experience and Engagement team before moving to Engineering, working with networking. Leitner currently is the product owner for some kernel networking subsystems and leads the adoption of OvS hardware offload solutions on Red Hat Enterprise Linux RHEL, always with a focus on customer's real needs.