The Road To High Availability for OpenStack

2014년 4월 16일Arthur Berezin2분 읽기

Why OpenStack High Availability is Important?
Many organizations choose OpenStack for it’s distributed architecture and ability to deliver Infrastructure-as-a-Service environment for scale-out applications to run on top of it, for private on premise clouds or public clouds. It is quite common for OpenStack to run mission critical applications. OpenStack itself is commonly deployed in Controller/Network-Node/Computes layout where the controller runs management services such as nova-scheduler that determines how to dispatch compute resources, and Keystone service that handles authentication and authorization for all services.

Although failure of the controller node would not cause disruption to already running application workloads on top of OpenStack, for organizations running production applications it is critical to provide 99.999% uptime of the control plane of their cloud, and deploy the controller in a highly available configuration so that OpenStack services are accessible at all times and applications can scale-out or scale-in according to workloads.

Address High Availability Needs
Deploying a highly available controller for OpenStack could be achieved in various configurations, each would serve certain set of demands, and introduce growing set of prerequisites. OpenStack Environment consists of stateless, shared-nothing services that serve their APIs - Keystone,Glance, Swift, Nova-schedule,Nova-api, Neutron, Horizon, Heat, Ceilometer, etc. - and underlying infrastructure components that OpenStack services use to communicate and save persistent data - MariaDB Database, and a message broker - RabbitMQ - for inter-service communication.

Maintaining OpenStack services’ availability and uptime can be achieved with fairly simple Active/Passive cluster configuration and a virtual IP address forwarding communication to the active node. As the load and demand on OpenStack services grow organizations are interested in the ability to add nodes and scale-out the controller plane. Building a scale-out controller would require setting all services and infrastructure components (database and message broker) in Active/Active configuration and confirming that they are capable to add more nodes to the cluster as load grows, and balancing the API requests load between the nodes.

High Availability Architecture for RHEL-OSP (Red Hat Enterprise Linux OpenStack Platform)
We are heavily investing to provide a fully supported, out of the box, Active/Active high availability solution for OpenStack services and underlying infrastructure components based on mature industry proven open source technologies. In a multi-controller layout services run on all controller nodes in a highly available clustered configuration.

OpenStack Platform 5.0 high availability solution uses Pacemaker to construct Active/Active clusters for OpenStack services and HAProxy load balancer. API calls are load balanced through clustered HAProxy in front of the services where every service has it’s own virtual IP(VIP). Such setup makes it easy to customize layouts and segregate services as needed. Galera is used to synchronize the persistent data layer across the running database nodes. Galera is a synchronous multi-master cluster for MariaDB database, handling synchronous replication. This enables an Active/Active scale-out of the database layer without requiring shared storage.

Out Of The Box With Foreman Openstack Installer
To make the high availability solution for OpenStack Platform 5.0 extremely easy to consume and setup we are fully integrating it with a project named Staypuft. StayPuft is a Foreman user interface plugin which aims to make it easy to deploy complex production OpenStack deployments. StayPuft will be delivered as part of the Foreman Openstack Installer for OpenStack Platform 5.0.