The beauty of open source software is that anyone can download the code and then install, configure and use it. The challenge is doing so in a production environment at scale. That's what communications service providers (CSPs) face as they build out platforms at scale to operate cloud environments that serve millions of users. These cloud platforms include compute, networking and storage hardware as well as software to automate, manage, monitor and secure the platform.

While a CSP can download the many open-source software projects required to build out a platform, the provider also has to:

  • Integrate the open source projects.
  • Ensure the software works with their hardware.
  • Automate processes and manage the environment.
  • Comply with regulations.
  • Manage upstream contributions and self-support unmerged features and code.

Accomplishing all this requires an investment in the upstream open source community and a solid team consisting of highly skilled administrators, developers and project managers.

At the November 2017 OpenStack Summit in Sydney, a panel titled "I pity the fool that builds his own cloud! –Overcoming challenges of OpenStack based telco clouds" featured cloud architects and engineers from Ericsson, AT&T, NTT and SKT who described the challenges their organizations faced while building OpenStack clouds themselves. Some highlights from the panel:

  1. "Troubleshooting is not just OpenStack. It is [the] operating system, DPDK, SRIOV, … Engineers need to know all the systems." (Listen to a 30 second excerpt here)
  2. Bugs need to be reported upstream, but the community often responds: "It works in our devstack it must be your environment... integrated testing is lacking." (Listen to a 30 second excerpt here)
  3. Hiring people with the right skills is hard. Developers and engineers want to work for gaming or Internet companies, not telcos. (Listen to a 30 second excerpt here)

Rather than go it alone, CSPs can and should lean on their OpenStack technology partners when they can. For example, when deploying network functions virtualization (NFV) from Red Hat (see below), you get a tested, integrated and supported NFV infrastructure (NFVi) solution of:

  • Red Hat OpenStack Platform
  • Red Hat Enterprise Linux
  • Red Hat Virtualization (based on the KVM hypervisor)
  • Red Hat Ceph Storage
  • Red Hat Ansible Automation
  • Certified and validated partner software-defined network (SDN) plugins, virtual network functions (VNFs) and management and orchestration (MANO)

Red Hat engineers possess skills for Linux, KVM virtualization, Ceph storage, networking, Ansible automation, containers, and more. They are able to troubleshoot issues in a deployment, and are equipped to duplicate an environment and debug across all projects and components. And Red Hat can solve problems across the entire stack, such as the example below from our knowledge base. Compare the following to a community response: "... it works in devstack. It must be your environment."

  • Support identified issues with VF's not supporting their macs address being changed and created an environment to reproduce the issue. Meanwhile, engineering released a series of complicated kernel and qemu patches as hotfixes for the compute nodes that were later released as errata.
  • The symptoms seen by the customer were guest instances panics systemically across their production environment, causing critical impact. Investigation took us right into QEMU's AIO code on compute nodes. The investigation required investigation at the guest instance level (sosreports and vmcores), libvirt and QEMU and kernel. Support performed some excellent analysis in order to find this subtle race condition between the worker thread and the main thread of qemu-kvm.
  • Red Hat support worked with the customer to identify root cause of a failed active compute node and then with engineering to back port an upstream fix for heat.

Red Hat can provide this support because of the process it uses to create production-ready software:

Also, Red Hat does not work alone. We have a large and rich partner ecosystem to provide open and tested solutions. And, we offer a customer portal that clients can use to self-serve many of their requests. It's a well-used tool; one customer accessed the portal for 3,126 pages of documentation, had 299 discussions, tapped into the knowledge base 5,778 times and reviewed 163 product pages in just three months!

Production environments also have to be more secure. Below is a snapshot of security vulnerabilities and advisories resolved by Red Hat in 2016 across Linux, OpenStack, storage and middleware.

Red Hat also represents customers in the OpenStack communities in order to get their features into the upstream (as documented in this process). Below is an example of RFEs requested by a customer, which Red Hat will strive to fulfill first in the upstream community and then test and validate in Red Hat OpenShift Container Platform.

Likewise, lifecycle is very important for a production environment. Red Hat products are supported for years, for bug fixes and backports. For example, Red Hat Enterprise Linux is supported for 10+ years, and Red Hat OpenStack Platform is supported for up to five years. Other Red Hat products have similar lifecycles.

In summary, the option of trying to build a cloud yourself entails hiring hard-to-find engineers, relying on the community for support with no SLA and no ability to replicate your environment, no lifecycle support and no representative in the community to get your feature requests included. The better alternative: Use the Red Hat integrated and tested solution backed by our nearly 25 years of open source experience to troubleshoot problems across the stack and our certified partner ecosystem.


Sobre o autor

Jonathan Gershater joined Red Hat in 2013. Prior to Red Hat, Gershater worked at Trend Micro, Sun Microsystems, Entrust Technologies and 3Com. At Red Hat, Gershater leads market analysis for Red Hat’s open hybrid cloud platform, OpenShift, and related technologies.

Read full bio