Turn on the faucet (tap) and you expect water to flow evenly, not sputter. Turn on the light switch and you expect the light to shine, not flicker. Pick up the phone and you expect dial-tone and the ability to dial a number and hear the other person, not silence. This creates a consumer expectation of “always available.”
Evolution of Telco Service Availability
Historically, telcos coined the term “carrier grade” to define five nines (99.999%) or six nines (99.9999%) availability of telecommunications services. (Six nines availability equates to about six seconds of downtime per year.) Telcos were given a monopoly over a geographic region in exchange for utility-like service, especially as phones must be available in an emergency, for example dialing 911 in the USA or 112 in Europe.
This utility-like service is performed using land and undersea cables that remain in service barring any weather or man-created catastrophe.
Since then, telecommunication services have vastly grown in complexity. Telcos support multi-function wireless smartphones that transmit voice calls, transfer data, and offer email and web functionality. These services are performed over wired and wireless networks: worldwide, inside buildings, up mountains and down valleys.
Still, for a long time, telecommunication services were built like they were decades ago: Tightly coupled to specialized, expensive hardware and software components built to “avoid failure,” because failure of components typically leads to service failure.
Nowadays, the expectation of consumers of “always available” services has remained the same, but the industry has learned to build systems smarter and more cost-efficiently on commodity hardware. Network Functions Virtualization (NFV) applies the lessons learned from designing and operating large-scale cloud systems to telco services.
At scale, 5 or 6 “nines” availability of single hardware or software components become meaningless: The sheer number of involved components means there will inevitably be multiple failures of compute, network and storage components per month. Instead, NFV applications, their management systems and the NFV Infrastructure management platform need to be designed to “tolerate failure.”
An OpenStack Platform for Network Functions Virtualization
At Red Hat, we have a long, proud history of building robust and reliable open source systems and offering best-in-class support for them. This has earned us the trust of customers across the industries, including telco, who have deployed their mission critical workloads on our solutions and share our passion for quality engineering.
To make OpenStack, the open source NFV platform of choice, ready for the highly demanding telco environment, we have worked closely with our telco customers and partners and invested significant engineering efforts over the past months. Through rigorous engineering and testing, we have eliminated single points of failure (SPOF). In our OpenStack HA Reference Architecture, all components (control nodes, messaging bus, database, etc.) are fully redundant and provide scale out services. We have also added functionality that detects and isolates faults in compute, network and storage resources and more quickly notifies affected workloads and their management systems so they can mitigate the situation in a timely manner. Further, we have added live upgrades, so that the platform can be updated with zero downtime.
Telco and NFV do not only have mission critical availability requirements, though. They also have highly stringent performance and predictability requirements. To address these, we have developed several platform enhancements and tunables that give service providers and their vendors tight control about performance and predictability of their NFV workloads, for example:
- Non-Uniform Memory Architecture (NUMA) aware scheduling allows applications and management systems to allocate workloads to processor cores considering their internal topology. This way, they can optimize for latency and throughput of memory and network interface accesses and avoid interference from other workloads. Combined with other tuning, we are, for example, able to reach >95% of baremetal efficiency processing small (64 bytes) packets in a DPDK-enabled application within a VM using SR-IOV.
- Real-Time KVM extensions enable NFV workloads with the well-bounded, low-latency (<10μs) response times required for signal processing use cases like Cloud-based Radio Access Networks.
- Open vSwitch enhancements based on the Data-Plane Development Kit (DPDK) enable higher packet throughput between Virtualized Network Functions and network interface cards as well as between Network Functions.
As always with Red Hat, these are not proprietary vendor extensions, but we have taken care to work closely with the upstream communities of the involved projects (beyond OpenStack this includes the KVM, libvirt, Linux, OvS, and many other projects) to ensure that these extensions are now part of the mainline developments and available to everyone. Only this approach can provide customers with the peace of mind of minimal vendor lock-in and long-term sustainable investments.