One of the key requirements to success with cloud computing comes in the form of self-service—the idea that developers and operators need to get access to the resources they need when they need them. When resources are gated behind manual approval steps and hidden within an overflowing ticket queue, cloud technology's major benefits are lost.
In addition, people are expensive resources. VMs and containers are comparatively much cheaper. Expensive resources should never sit waiting for inexpensive resources to be provisioned.
Self-service cloud computing ensures this doesn't happen.
Self-service is a major capability for teams deploying cloud technologies, but the governance challenges are very real. Resources are finite—compute, storage, budget—and unlimited self-service will almost certainly result in the tragedy of the commons, resource exhaustion, or regulatory, compliance, or legislative violations.
The key is finding the middle ground—the guardrails on the sides of the road—without introducing roadblocks. Roadblocks create friction, and friction promotes shadow IT growth.
In this article, we'll discuss how you can structure a self-service capability around your cloud technologies, what principles you should consider, what you need to consider in decision frameworks, and why frictionless onboarding and self-service are crucial.
1. Start with an understanding of organizational requirements
Cloud capabilities are entirely based on organizational workload requirements. A mature team can focus on delivering capabilities that return high-value use cases first. Their initial focus is not on providing a like-for-like capability to compete with the hyperscalers—the focus must be on achieving real, tangible business outcomes quickly.
The team should engage with end-users to determine the characteristics of workloads deployed on your private cloud—for example, storage requirements (IOPS, throughput) or networking requirements—and architect their environment appropriately. Key to this is understanding the existing pain points with users and how your cloud solution can remedy them.
A practical approach is to classify those organizational requirements into immediate, near, and long-term groups. Immediate requirements must be met on day zero—if they aren't addressed, you can't go live. Near-term requirements need to be the next focus once go-live is achieved, and long-term requirements are the "nice to have" requirements completed on an opportunistic basis.
Understand the existing requirements, the pain points, and clearly map those to your cloud capabilities. If you don't, you may find that if you build it, they will not come.
2. Establish a decision framework for workload placement
A decision framework helps consumers assess the suitability of public versus private clouds for their workload. The decision framework also forms the basis of future hybrid-cloud policies and automated workload placement.
The weights of the criteria in a decision framework differ per organization but should include at least the following considerations:
- Regulatory requirements
- Security requirements
- Data and ownership provenance requirements
- Importance of data co-residency
- Elasticity requirements
- Predictability of scaling
- Service continuity requirements
- Workload predictability
- Total workload costs: Compute, Networking, Storage, Monitoring, and Management
Operations teams should aim to codify this decision framework and make it transparent to end-users. Remember, simplicity and transparency are key to a well-designed guardrail. The more you make your end-users think about, the more friction you create, and the more likely they will choose another service—or adopt shadow IT and do it themselves.
3. Remove friction
Public cloud consumption for initial project ramp-up phases is relatively frictionless. This ease of use attracts developers and project teams as they often focus on functional requirements before organizational/governance requirements.
Friction is the number one problem that will kill adoption of private cloud technology. As a result, it is crucial to make the onboarding process for new customers as quick and painless as possible. There should be no surprises, and onboarding should be driven by self-service, automated processes. Members of your team should be available to help new customers onboard.
Mature teams provide a set of capabilities to minimize the friction of their private cloud platforms. These features include:
- Clear, effective, and concise onboarding procedures for new users to get started
- Self-service capabilities with automated workflow and approvals
- Sensible defaults in terms of environment sizes that cater for different usage patterns ranging from experimentation to larger project-based environments
- Access to a choice of contemporary developer workstations and tools, including cloud-native Integrated Development Environments (IDEs) that define development environments as code for Kubernetes-based clouds
- Access to cloud APIs that integrate into IDEs and provide automation capabilities
- Application architecture guidance and patterns to ensure the best usage of the cloud infrastructure
Quotas and resource constraints are always a challenge, and the correct limits will be organization, project, and team dependent. Constraints that are too tight will choke off the natural growth of the technology and encourage shadow IT. Constraints that are too loose can lead to regulatory/compliance violations or resource exhaustion.
As a rule of thumb, remember that it is easier to loosen resource constraints than tighten them. Expectations, once set, are hard to shift.
4. Define clear Service Level Objectives (SLOs)
SLOs ensure that cloud consumers have the information to make informed decisions about the suitability of the platform for their workload and the architectures that they need to adopt. They also provide operations teams with considerably more flexibility on day-to-day patching, maintenance, and feature roll-outs for the cloud platform.
Service Level Objectives form the basis of the metrics that the cloud team uses to communicate the performance of the private cloud environment with management and become an indicator of when to scale out the cloud environment.
To be upfront, SLOs are probably not a day zero consideration, but they should almost certainly be a near or mid-term requirement. Deciding on SLOs should not be a higher priority than gaining experience with the platform and understanding how it runs in production. After all, you can't create reasonable estimates for SLOs until you know how the cloud runs in the real world.
Adopting an SLO approach is more than just deciding on a target number and putting up a dashboard. You will need organizational buy-in, as implicit in an SLO model is the fact that at some time, during business hours, a service will be down—and that's OK if the service is still within the SLO. IT departments that have adopted a "we never go down in business hours" or "we never disrupt our users" approach will need to spend time resetting organization expectations to achieve the benefits that SLO culture provides.
Choosing correct SLOs for your self-service capabilities is too broad a topic to be covered in one article; there are some excellent books out there that cover this in detail.
Self-service is what drove public cloud computing—compute resources on demand. No tickets, queues, emails, or phone calls required: Just a credit card and a login. Self-service, and in particular eliminating friction, will help drive the adoption of private clouds.
In all cases, start by understanding the organizational requirements, particularly the pain points that exist today. Group requirements into those that need to be fixed from day zero, those that can wait a few months, and those that are long-term goals.
Self-service capabilities allow the business to add guardrails to the adoption of cloud resources to ensure regulatory or legal compliance and ensure that resources are not consumed excessively. Integration with service catalogs and a robust, automated onboarding process helps eliminate friction and encourages the adoption of your cloud.
As the team grows in maturity, the adoption of Service Level Objectives allows further cultural change to occur in the business. Culture transforms from one where failure goes from a "this never happens, and if it does, heads will roll" event to one where it's OK if failure occasionally happens, as long as the SLO targets are still intact. Users can decide how to architect their applications to account for your SLO targets.
Once you understand the organization, have provided onboarding capabilities, and have eliminated friction, you're well on the way to a successful rollout. Day 2 is when private cloud operations start to "get real" as a host of ongoing sustainment challenges will present themselves. The next article in this series will discuss what a cloud architect needs to consider when it comes to operations teams and how to approach common day-to-day challenges.