How to manage your cloud platform's lifecycle
Keeping a private cloud platform up to date can seem deceptively simple; as simple as running the platform's "upgrade" component. However, it can be the platform's Achilles heel if not done properly, and it can consume your whole team's energy to rectify issues that go wrong.
In the first article in this series, we highlighted various aspects of cloud architecture; in the second article, we described self-services delivery; in the third article, we explored operations; and in the fourth article, we tackled resource and capacity management. This article discusses the elements of successful lifecycle management for a private cloud platform.
What is lifecycle management?
Lifecycle management includes the configuration and currency of the cloud environment's core processes, services, operations, and support components. Private cloud technologies have short release cycles that make it challenging for enterprises to stay current. For example, Kubernetes has a quarterly release cycle for new features while rolling out patches and security updates continuously.
It is possible to lighten the burden of ongoing upgrades on your organization by using commercially supported versions of cloud platforms that provide longer-term release cycles.
As we wrote in the first article, Capability Maturity Model Integration (CMMI) provides a framework for the maturity of the processes that combine the people, procedures, and tools to deliver capabilities. The CMMI process areas most relevant to lifecycle management are:
- Service Continuity (SCON)
- Work Monitoring and Control (WMC)
- Work Planning (WP)
- Supplier Agreement Management (SAM)
The need to stay current requires that cloud delivery teams adopt development practices such as automated testing, continuous integration, and infrastructure-as-code practices. You can no longer manage general upgrades and patches as major projects.
Teams require development, test, quality assurance, and production environments that accurately reflect the functional production configuration. It is important to match as many aspects as possible, including service topologies, encryption methods, and hardware configuration down to network interface cards. This recommendation excludes the practicalities around costs and sizing. However, the savings of debugging time and the cost of outages will offset the extra hardware costs.
Automate the upgrade lifecycle
All changes related to the environment, including patching, configuration, monitoring, and support utilities, should be done through automated deployment processes. Cloud platforms such as OpenStack and Kubernetes are API-driven, allowing for teams to include test automation.
[ Get the checklist: 10 considerations for Kubernetes deployments. ]
Sandbox environments serve as development environments for cloud engineers. This is where the engineer develops changes and builds unit tests. You should then manage the automated changes and unit tests in a Git version control repository. These changes, once ready, can be moved to test, quality assurance, and production environments according to continuous integration and continuous delivery (CI/CD) practices. Initially, you may still want to control deployment through environments manually.
Any manual steps during this process must be automated or follow strong assurance processes to ensure that processes are completed on time and as intended.
Manage configuration drift
When you manage your configuration using a version control repository, you can look at using GitOps. GitOps requires your team to capture all the configurations in a Git repository and use automation processes that can create complete environments from that configuration at any time.
Advanced cloud platform teams can destroy and recreate clusters periodically, which is sometimes called Phoenix clustering. Phoenix clusters counter configuration drift, ensure that you can respond to infrastructure disasters, and serve as a training exercise for new hires. This practice is dependent on mastery of infrastructure-as-code and automation within your team.
A cloud engineer is empowered to refine the lifecycle management process with the right mindset and supportive environments (labs, development, and test). Coupling this process with effective automation and configuration management makes lifecycle management easier to carry out.
Navigate the shifting technology landscape. Read An architect's guide to multicloud infrastructure.