If you are a telecommunications provider, you are already using cloud technologies and most likely looking closely at network functions virtualization (NFV). So, you probably know that the move to cloud and NFV infrastructure can complicate the delivery of high-availability services. At the recent OPNFV Summit in Beijing, Red Hat’s Aaron Smith and Pasi Vaananen took the stage to share a proof-of-concept (POC) project that gets right to the heart of the challenge. Turns out, delivering high availability services can be done.
“The move to NFV and the cloud infrastructure makes delivering high availability challenging,” Smith, a senior principal software engineer, told attendees. “So, you no longer have these nice vertically integrated pieces of hardware and software, which can all work together. But the same availability requirements still apply.” And for a cloud infrastructure, the network has a disproportionate impact on the availability in the system in that it’s not just on node or a pair of nodes. There are thousands of nodes connected by the switching infrastructure, Smith said.
During their presentation, the duo outlined the goals of the POC, how they put the POC together, how it worked and what was learned. The project set out to:
- Produce a monitoring and event detection framework that distributes fault information to various listeners throughout the system with low latency (less than tens of milliseconds).
- Provide a hierarchy of rendition controllers which can react quickly (less than tens of milliseconds) to faults.
- Provide fault management mechanisms for both current virtualization environments and future containerization environments orchestrated by Kubernates, etc.
In developing the POC, the team used the European Telecommunication Standard Institute’s (ETSI) concept of a Fault Management Cycle. “We wanted to quickly figure out the cause of the fault without over-analyzing the problem,” said Vaananen, a systems architect in Red Hat’s NFV office of technology. Based on that model, the POC was designed using the following fault management cycles:
- Detection, using low-latency, low-overhead mechanisms
- Localization with physical/virtualized resources to resource consumer(s) mapping within context of fault trees
- Isolation to remove the ability of a failed component to affect service state
- Remediation using service restoration through failover to redundant resource/component, or component restart
- Recovery, through the restoration of service and redundancy configuration
From there, the team determined a fault management cycle timeline. During the presentation, Vaananen provided specific details in how all that was done, and shared some compelling data points from a survey of service providers about the telco requirements for NVFi done in late 2016. The survey, commissioned by Red Hat and conducted by Heavy Reading, found that telcos do understand the importance of services availability. Historically, telcos were focused on ensuring that the hardware had five-nines availability, but with the move to NFV-I, the focus has moved to high service availability because their customers will not tolerate any downtime, according to the study. In fact the study found that telcos overwhelmingly agree that high service availability is more important than hardware availability for NFV, with 97% agreeing with that statement.
Vaananen stressed that when it comes to NFVi, there are only a few ways for things to work right, and an infinite number of things that can go wrong, so “it is important to determine where best to put resources to make the biggest impact on service availability.” Different infrastructure components do have different impact potential on the application level service availability, e.g., network switch faults have a very high impact potential on the service availability because they can affect all associated nodes and services.
With that in mind, the team’s POC focused on demonstrating that events, such as node network interface failures on a host, could be detected, and also that event messages could be delivered to subscribed components with consistently low latency. It also was designed to show that an application could be enhanced to include the subscription and reception of events, and ensure that the collected framework is suitable for event monitoring. Finally, the POS was built to prototype integration with OpenStack services and prototype a node/switch monitoring system that could provide quick detection without adding significant overhead.
According to Smith, the POC achieved much of what it set out to do. Specifically, the project was able to demonstrated that the events could be detected. While there’s still more work to be done regarding the delivery of event messages, the POC also demonstrated that could be done. As for showing that telco and enterprise apps can be enhanced to include the subscription and reception of events, and that the collected framework is suitable for event monitoring, that work is still in progress. “Low latency is achievable, but the issues of scale and security need to be addressed,” Smith said, adding that work continues to advance these efforts. In fact, within the OPNFV organization, there’s work on a common data/object model for events and telemetry that could be used to advance high availability services.
“We feel good about what we actually attempted to do. It’s doable,” Smith told attendees. “And the next step is to make sure that all the pieces come in to be able to do the rest of it.”
Dig deeper into the POC in this video of full presentation, and let us know your thoughts on delivering high availability services using NFV and cloud in the comments section below!
Sobre el autor
Navegar por canal
Automatización
Las últimas novedades en la automatización de la TI para los equipos, la tecnología y los entornos
Inteligencia artificial
Descubra las actualizaciones en las plataformas que permiten a los clientes ejecutar cargas de trabajo de inteligecia artificial en cualquier lugar
Nube híbrida abierta
Vea como construimos un futuro flexible con la nube híbrida
Seguridad
Vea las últimas novedades sobre cómo reducimos los riesgos en entornos y tecnologías
Edge computing
Conozca las actualizaciones en las plataformas que simplifican las operaciones en el edge
Infraestructura
Vea las últimas novedades sobre la plataforma Linux empresarial líder en el mundo
Aplicaciones
Conozca nuestras soluciones para abordar los desafíos más complejos de las aplicaciones
Programas originales
Vea historias divertidas de creadores y líderes en tecnología empresarial
Productos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servicios de nube
- Ver todos los productos
Herramientas
- Training y Certificación
- Mi cuenta
- Soporte al cliente
- Recursos para desarrolladores
- Busque un partner
- Red Hat Ecosystem Catalog
- Calculador de valor Red Hat
- Documentación
Realice pruebas, compras y ventas
Comunicarse
- Comuníquese con la oficina de ventas
- Comuníquese con el servicio al cliente
- Comuníquese con Red Hat Training
- Redes sociales
Acerca de Red Hat
Somos el proveedor líder a nivel mundial de soluciones empresariales de código abierto, incluyendo Linux, cloud, contenedores y Kubernetes. Ofrecemos soluciones reforzadas, las cuales permiten que las empresas trabajen en distintas plataformas y entornos con facilidad, desde el centro de datos principal hasta el extremo de la red.
Seleccionar idioma
Red Hat legal and privacy links
- Acerca de Red Hat
- Oportunidades de empleo
- Eventos
- Sedes
- Póngase en contacto con Red Hat
- Blog de Red Hat
- Diversidad, igualdad e inclusión
- Cool Stuff Store
- Red Hat Summit