Improving scaling capabilities of Red Hat OpenStack Platform is an important part of product development. One of the features that helps to simplify the management of resources is called Cells. Simply put: Cells makes this easier by taking a distributed approach to management to support large scale deployments. In this post we'll look at how to use OpenStack Platform 16 with Cells.

Previously our team of performance and scale engineers described the process to scale OSP to more than 500 nodes. With Red Hat OpenStack Platform 16 we introduced full support for multiple Nova’s Cells v2 feature that helps operators manage more compute resources within the same region than was possible before. 

Previously, deployments were single-cell deployments which meant managing large environments required tuning Director parameters and therefore required more resources. In Red Hat OpenStack Platform 16, multi-cell deployment allows each cell to be managed independently. In this article we will describe our process of testing multi-cell deployment at scale with Red Hat OpenStack Platform 16.

Cells v2

Cells v2 is an OpenStack Nova feature that improves scaling capability for Nova in Red Hat OpenStack Platform. Each cell has a separate database and message queue which increases performance at scale. Operators can deploy additional cells to handle large deployments and, compared to regions, this enables access to a large number of compute nodes using a single API.

Each cell has its own cell controllers which run the database server and RabbitMQ along with the Nova Conductor services.

Main control nodes are still running Nova Conductor services which we call “super-conductor”.

Services in cell controllers can still call placement APIs but cannot access any other API-layer services via RPC nor do they have access to global API databases on control nodes.

Multiple cells

Cells provide a strategy for scaling Nova: each cell will have a message queue and database for Nova – separate failure domains for groups of compute nodes

OpenStack Cells v2 Services

Figure: cellsv2 services

Cells V2 simple cell has been available since Red Hat OpenStack Platform 13 (default deployment). Cells V2 multi-cell deployment was not managed by Director.

What are the benefits of Cells V2 multi-cell over the flat deployment? 

  • Fault Domains database and queue failures affect a smaller piece of the deployment

  • It leverages a small, read-heavy API database that is easier to replicate

  • High-availability is now possible

  • Simplified/organized replication databases and queues 

  • Cache performance is improved

  • Fewer compute nodes per queue and per database

  • Provide one unique set of endpoints for large regions.

Lab architecture

Details of our lab inventory is shown below.

+----------------------------------+------+
| Total baremetal physical servers | 128  |
| Director                         | 1    |
| minion                           | 1    |
| Rally                            | 1    |
| Controllers                      | 24   |
| Remaining                        | 100  |
| virtualcompute                   | 1000 |
+----------------------------------+------+

Each of the bare metal nodes had following specifications: 2x Intel Xeon E3-1280 v6 (4 total cores, 8 threads) 64 GB RAM 500 GB SSD storage

We have created a virtualcompute role that represents normal compute nodes but allowed us to differentiate bare metal nodes from VM based compute nodes.

Deployment - virtualcompute role

We have used VMs and the Virtual Bare Metal Controller (VBMC) to efficiently use lab hardware to simulate 1,000 compute nodes. The VBMC service allows you to import VMs to Director and manage as if it was baremetal node through IPMI (PXE boot, power on, power off) during deployment. Registered nodes were used as compute nodes using the virtualcompute role.

Compute Nodes Setup

Figure: compute nodes setup

Scaling Director - Minion

Because we had to manage a large environment and had limited resources for Director (only 4 cores and 64GB RAM) we decided to use scale Director using the Minion feature.

Note: Scaling Director with Minions are currently in Tech Preview support level for Red Hat OpenStack Platform 16.0.

Minions are nodes that can be deployed to offload the main Director. Minions usually run heat-engine service and ironic-conductor but in our case we have only used heat-engine on minions.

OpenStack Diagram 

The following graph shows the CPU utilization of processes that are running on Director during deployment. As illustrated, if we will move the load generated by heat-engine and ironic-conductor to minion hosts we can free CPU time for other services that are running during Heat’s stack creation.

OpenStack Graphs   

Deployment model

In Red Hat OpenStack Platform 16 we deploy a separate stack (usually one, but we can deploy more using split cell controller feature). This improves deployment time and day two operations – such as adding or removing compute nodes since we don’t need to update the stack with a large number of compute nodes.

(undercloud) [stack@f04-h17-b01-5039ms ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+
| ID                                   | Stack Name | Project                          | Stack Status       | Creation Time        | Updated Time         |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+
| 47094df3-f8d1-4084-a824-09009ca9f26e | cell2      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-12-17T21:39:44Z | 2019-12-18T03:52:42Z |
| c71d375d-0cc3-42af-b37c-a4ad2a998c59 | cell3      | ab71a1c94b374bfaa256c6b25e844e04 | CREATE_COMPLETE    | 2019-12-17T13:50:02Z | None                 |
| dc13de3e-a2d0-4e0b-a4f1-6c010b072c0d | cell1      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-12-16T18:39:33Z | 2019-12-18T09:28:31Z |
| 052178aa-506d-4c33-963c-a0021064f139 | cell7-cmp  | ab71a1c94b374bfaa256c6b25e844e04 | CREATE_COMPLETE    | 2019-12-11T22:08:11Z | None                 |
| ab4a5258-8887-4631-b983-ac6912110d73 | cell7      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-11-25T14:12:56Z | 2019-12-10T13:21:48Z |
| 8e1c2f9c-565b-47b9-8aea-b8145e9aa7c7 | cell6      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-11-25T11:26:09Z | 2019-12-14T09:48:49Z |
| d738f84b-9a38-48e8-8943-1bcc05f725ec | cell5      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-11-25T08:41:48Z | 2019-12-13T16:18:59Z |
| 1e0aa601-2d79-4bc3-a849-91f87930ec4c | cell4      | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-11-25T05:57:44Z | 2019-12-09T13:33:11Z |
| 250d967b-ad60-40ce-a050-9946c91e9810 | overcloud  | ab71a1c94b374bfaa256c6b25e844e04 | UPDATE_COMPLETE    | 2019-11-13T10:13:48Z | 2019-11-19T10:57:05Z |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+----------------------+

Deployment time

We have managed to deploy 1000 nodes within a single region. The deployment model used in the Red Hat OpenStack Platform 16 for multi-cell deployments scaled without issues and time to deploy was consistent regardless of total number of nodes:

  • ~360 minutes to deploy cell with bare metal 3 CellControllers and 100 virtualcompute nodes from scratch.

  • ~190 minutes to scale out the cell from 100 - 150 virtualcompute nodes.

Details of the deployed 1,000 node environment are shown below.

Deployment check: CLI

(overcloud) [stack@f04-h17-b01-5039ms ~]$ openstack hypervisor stats show
+----------------------+---------+
| Field                | Value   |
+----------------------+---------+
| count                | 1000    |
| current_workload     | 0       |
| disk_available_least | 21989   |
| free_disk_gb         | 29000   |
| free_ram_mb          | 1700017 |
| local_gb             | 29000   |
| local_gb_used        | 0       |
| memory_mb            | 5796017 |
| memory_mb_used       | 4096000 |
| running_vms          | 0       |
| vcpus                | 2000    |
| vcpus_used           | 0       |
+----------------------+---------+

(overcloud) [stack@f04-h17-b01-5039ms ~]$ openstack compute service list | grep nova-compute | grep up | wc -l
1000

Full command output: 

(undercloud) [stack@f04-h17-b01-5039ms ~]$ openstack compute service list | grep nova-compute | grep up

(undercloud) [stack@f04-h17-b01-5039ms ~]$ openstack server list --name compute -f value | wc -l
1000

(overcloud) [stack@f04-h17-b01-5039ms ~]$ openstack availability zone list
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| internal  | available   |
| cell7     | available   |
| cell5     | available   |
| cell1     | available   |
| cell4     | available   |
| cell6     | available   |
| cell3     | available   |
| nova      | available   |
| cell2     | available   |
| nova      | available   |
+-----------+-------------+

(overcloud) [stack@f04-h17-b01-5039ms ~]$ openstack aggregate list
+----+-------+-------------------+
| ID | Name  | Availability Zone |
+----+-------+-------------------+
|  2 | cell1 | cell1             |
|  5 | cell2 | cell2             |
|  8 | cell3 | cell3             |
| 10 | cell4 | cell4             |
| 13 | cell5 | cell5             |
| 16 | cell6 | cell6             |
| 19 | cell7 | cell7             |
+----+-------+-------------------+

Deployment check: Dashboard

OpenStack Deployment dashboard

Figure: Dashboard view of the aggregates and availability zones

Multi-cell deployment testing

We have tested functionality of multi-cell overcloud at 500, 800 and 1,000 node counts.

With 500 nodes running on 6 cells, we have not encountered any problems with the Overcloud. We have run the following Rally tests to generate intensive load on Overcloud.

test scenario NovaServers.boot_and_delete_server
args position 0
args values:
{
  "args": {
    "flavor": {
      "name": "m1.tiny"
    },
    "image": {
      "name": "^cirros-rally$"
    },
    "force_delete": false
  },
  "runner": {
    "times": 10000,
    "concurrency": 20
  },
  "contexts": {
    "users": {
      "tenants": 3,
      "users_per_tenant": 10
    }
  },
  "sla": {
    "failure_rate": {
      "max": 0
    }
  },
  "hooks": []
}

Results of Rally tests - Response Times (sec):

Action | Min (sec) | Median (sec) | 90%ile (sec) | 95%ile (sec) | Max (sec) | Avg (sec) | Success | Count |
nova.boot_server | 8.156 | 8.995 | 10.984 | 11.377 | 28.16 | 9.687 | 100.0% | 10000 |
nova.delete_server | 0.717 | 2.32 | 2.411 | 2.462 | 4.955 | 2.34 | 100.0% | 9999 |
total | 8.959 | 11.335 | 13.331 | 13.754 | 30.431 | 12.027 | 100.0% | 10000 |
 -> duration | 7.959 | 10.335 | 12.331 | 12.754 | 29.431 | 11.027 | 100.0% | 10000 |
 -> idle_duration | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 100.0% | 10000 |

During Rally tests loads generated on the control plane were handled without errors and we did not observe any anomalies of performance metrics.

OpenStack Graphs 

Above 800 nodes, we have noticed flapping OVN agents which caused problems with scheduling. This is most likely due to overloading ovsdb-server which we had offloaded by tuning OVN related parameters. 

We are still working on parameters tuning to prepare best practices and a complete set of parameters to tune for large multi-cell deployments. However, operators might want to consider using one of the SDN solutions when deploying a multi-cell environment with more than 500 nodes.