Governance in the cloud is an element in providing value to the business. Governance helps you make sure that revenues and costs can be matched, and that a budget is being used effectively. To align IT (costs) and business (revenues), companies should add metadata that links business drivers with IT resources. Naming conventions and tagging are the ways to add metadata to workloads and use it for visibility, analysis and automation. However, there are plenty of options for tagging and setting up the right policies can be a challenge. In this post, we will work with some of those challenges and their solutions.
Steps to Cost Management Success
Traditional IT management based on fixed resources stopped making sense with the cloud, an unlimited pool of resources that can be accessed from any point in the world. Companies are moving from a CAPEX intensive environment to a new OPEX based cloud. With the new consumption model that favours the cloud, the weight shifts from asset lifecycle management to resource governance. This generates additional requirements for forecasting and budgeting. But the question is still "are we spending our money well?"
The question is not so simple to answer because comparisons are difficult. The first reaction many organizations have is to believe that lower costs are better costs, but in many cases that is basically wrong.
For instance, it is easy to reduce costs by purchasing a storage service that is cheaper than the one you are using now. However, that change may be associated with a decrease in performance; can your application support it or would you be losing customers - and revenue - in the process? The same thing can happen if you reduce expenses at the cost of limiting the application availability and not investing enough in load balancers, databases or application workers.
In order to align business, resources and costs you need to take several steps; in this post we will outline some best practices we have been gathering about the topic.
Step 1. Define which perspectives you want to report
What is the purpose of IT? Services provided by IT will be normally used to support some business process. Linking business processes and costs is a big problem on its own and we are not going to try to solve it here. What we can do, from an IT perspective, is provide the information that will help to map the business to the IT supporting it.
In how many ways does the business need to report the information? What subset of them is feasible (today) to identify? Trying to cover too many of them will make your governance workflows brittle, while too few of them and you won’t be able to provide the information that the business needs.
Fortunately, some of the information is already categorized. AWS and Azure segment the information into accounts, regions and services. Red Hat OpenShift does something similar using clusters, projects and nodes. Those perspectives can help give us some insight into the technical requirements that we shouldn’t neglect. Our main objective is not replicating the native categorization of each of the sources, but to add the missing link between the business and the resources.
Your metadata (mainly names and tags) are used for many things, not only cost management. For example, they may define security profiles or selecting between automation options, and you need to be conscious of what you are trying to achieve:
For example, your taxonomy for cost management could consider these different perspectives:
Ownership and usage. The owner and the user of the resource (i.e the unique identifier of the user who requested the resource and the one that is actually consuming the resource). In many cases, your automation tool won’t use a different account to match each one of your users, and thus you need to use metadata to put that information back.
Tenancy. If your environment is shared, it can be beneficial to understand which group or business unit has requested the resource. When the user can be part of different groups, one needs to be selected. For cost reporting, this is achieved in many cases using cost center. But department, project, partner are also good candidates.
Location. Once your organization reaches a global scale it may start deploying pieces of software throughout the world, for regulatory or for performance reasons. Cloud providers already identify the region where your resources are running, but your private cloud can be different.
Environment or stage. You may want to differentiate between development and production, so that different costs decisions can be taken depending on the environment where you are creating or running the resources. If your development pipeline already includes stages, like development, tests, staging, preproduction and/or production, this is a good candidate.
Application / Project / Service / Event. What is the service that your resource is helping to provide? Is that a group of transient resources for an event (your yearly customer-focused demo show?). You could even include application version.
Step 2. Standardize the presentation of chosen perspectives
The most important piece: consistency. If you want to reflect the business appropriately, you should be thinking on using policies and automation to categorize all resources appropriately and with all the necessary dimensions, or the data you get will start to be out of sync with the actual environment you are running. Start by giving your resources a name that helps you identify your resources without accessing metadata (many clouds have a guide about how to do this properly—see the Links section at the end as examples), and then continue by adding metadata to it.
Continue defining the tags that you will use. For step 1, you should have a list of candidates to choose from. It is time to map them into keys and values. Keys will map to perspectives, while values will define the different options allowed for each key. In some cases, the value will be Null.
However, not all environments allow the same identifiers. As we will see later, there are significant differences in the length and characteristics of keys and values for the different sources. For instance, in many cases Azure limits the number of tags to 15, while other clouds allow up to 50. (Azure is actually increasing their limits to 50 and have started doing so in many services.) In any case, Azure is not case sensitive, while OpenShift and AWS are.
For that reason, be prepared to make a clear policy of what needs to be tagged, what tags are mandatory and what tags are optional, making sure that there is no room for interpretation.
If the values need to be chosen between a list, make sure that those values are defined, consistent, and easily accessible, or that the list is presented to the user. It does not help to define development with the key "Development", and in some other cases "Dev", "DEV", or "R&D."
In the next table you can see a comparison of tags or labels in AWS, Azure and OpenShift 4:
Step 3. Automate Resource Tagging
You can ask users to tag their objects. In Red Hat, there are internal teams that have a clear policy on that: if you don’t tag your resources, they are automatically deleted. But manually tagging each resource is a complicated task. You will likely miss some tags or some individuals will make choices that will conflict with each other, making your tag strategy brittle and inconsistent.
In some cases, it is even possible that the organizations taking care of different environments are different. Business requirements will likely be different and thus the business representation in the sources will be different, too. In that situation, the most complicated thing is the political side of getting business units aligned.
In Azure, you can use Azure Policies to enforce tagging rules and conventions and avoid resources being deployed that do not comply with your expectations. You can create a policy that automatically applies the needed tags during provisioning, that enforce a predefined format for dates, or that just make some tags mandatory for some resource type.
In AWS, you can use IAM policies for the same. And you can use an automation tool like Ansible to add the necessary tags during provisioning and make sure that all the resources have been properly tagged.
Step 4. Review the outcome regularly
It is a good practice to define tags and use them as early as possible, even if you need to adjust your tagging scheme afterwards. A baseline strategy allows you to test it with your business owner to see if it is really helping you to generate the right reports faster and more easily.
Every few weeks, you should be ready to review your tagging strategy and apply lessons learned: not only your IT environment can benefit from additional optimizations, but your business organization will likely evolve. For example, new services are launched, companies are acquired and acquire other companies, the market evolves in a new direction, or some cloud launches a shiny new service that promises to completely change the architecture of your applications.
For that reason, it is important to assess the maturity of your organization and the tools available for it. Implementing more characteristics of your automation tool can easily make your workflow leaner and make your environment more productive. On the other side, new versions of your reporting, automation or governance tools can provide new ways to add metadata to your environment and thus enable new workflows, additional reports and a better governance of your resources and costs.
How to put all in practice
Tagging can help your IT work closer to the business and deliver key information easier and faster. We believe the best way to provide solutions to IT problems is to do that in the open. Joint innovation can help everybody, especially with hard to solve problems like this when it is really complicated to do it right.
For that reason, at Red Hat we are working hard on an open source project that is focused on cost management of containers and cloud. The project is named after a Japanese word ( 石 - koku, which is a volume unit of dry measure), and the code is freely available under the AGPL 3.0.
We are working to make everything possible that is part of this post, and will be increasing its functionality with the help of partners and customers, the open source way. Feel free to take part in the conversation and contact us if you want to discuss features and the roadmap.