As more organizations move into the public cloud, you are likely to meet some familiar challenges. Cloud computing enables many new networking, storage, and computing scenarios. Even the most trivial of migrations may incur major changes to your organization's infrastructure stack and its cost structure. Here are some common challenges that you may encounter and proven ways to work around them.
Challenge: Increased cost
One major difference between the public cloud and on-premises solutions is what I like to call the "single bill" problem. If you own your data center or pay rent in a co-location data center, costs are spread out over many purchase orders. You buy servers from one vendor, storage from another, networking from a third, and core infrastructure software services from yet another set of vendors. These invoices and budgets often come from various parts of the IT organization, giving those paying an incomplete picture of the infrastructure's total cost of ownership. Here is where the challenge comes in.
When you move to the public cloud, all those services arrive in one invoice.
In a past job, one of my employers (a Fortune 100 telecommunications firm) had one application (which was a core business application but did not directly make money for the organization) that represented 75% of our IT infrastructure costs. We had an advanced ability to quantify expenses, but the charges were spread across many budgets and organizations. It was challenging to get anyone to take action to reduce the application's cost.
In my opinion, having a single bill is a good thing—it allows organizations to reduce waste and identity outlier systems. Having a single statement for the application I outlined above would have gone a long way to drive action.
The importance of OpEx and CapEx
One of the things that changes with cloud computing is the shift between operational and capital expenses (OpEx/CapEx). In an on-premises world, major IT purchases like servers and storage are capital expenses, which are defined as purchases of significant goods or services which will help the company over time. These expenses spread out the cost of an asset over multiple years, allowing companies to realize tax benefits and depreciation over a number of years. They can be financed via debt or collateral. Operational expenses or OpEx are day-to-day expenses like rent, taxes, or travel costs. In the early days of cloud computing, most organizations aimed to move all IT expenses into OpEx. However, as larger enterprises have moved into the public cloud, providers have shifted their offerings to offer both CapEx and OpEx options.
Having this single bill brings heightened executive awareness of the cost of IT services.
Solution: How to avoid cloud cost concerns
When talking to cloud experts and architects, you will often hear that the most expensive way to operate in the cloud is to execute lift and shift migration techniques. While storage in the cloud tends to be cheap compared to on-premises options, standard virtual machines (VMs) are relatively expensive. There are a couple of ways to reduce these costs without completely rearchitecting your systems, which obviously cannot happen overnight. One of the easiest ways to reduce your costs is to reserve your VMs for 1-5 years, which converts the charge into a capital expense as opposed to an operating expense. In my experience, you can see discounts of up to 70% for reserved VM instances. These reservations can typically be traded within virtual machines classes, but there is a penalty associated with canceling them, so you should evaluate the contract carefully.
Another simple approach is to turn off non-production VMs when they are not in use. When a VM is deallocated, you are only responsible for the underlying storage costs. In an on-premises world, that wouldn't make a lot of sense since we own and keep paying for the infrastructure. With cloud infrastructure, there can be significant savings. This approach can be automated and combined with other cloud governance mechanisms like locking and tagging resources to reduce costs for your lower priority environments. If you move into a Platform as a Service offering, one of the questions you should ask is about the ability to pause your resources—some services include this option to reduce costs.
Challenge: Security considerations
One of the more common concerns around cloud computing is the security of every aspect of the system architecture. According to Gartner, 90% of the organizations that fail to control public cloud use will inappropriately share sensitive data by 2025. This risk comes most directly from the major dangers of misconfiguration. For example, according to Business Insider, companies both large and small have been breached by having Amazon S3 storage buckets open to public access. While having public object storage can be useful for something like a static website, storing sensitive data in anything exposed directly to the Internet is inviting a data breach. As an experiment, I once put a database server (with no data) on the public Internet with a very weak admin password—it was breached within minutes. The risk, both automated and targeted, is real.
Most external risks to your organization involve putting resources on public networks that should only remain private. You should also evaluate your internal risks and ensure that your cloud control plane is secured with two-factor authentication and even privileged identity management for administrator accounts. Compared to an on-premises environment, you have a potentially larger surface area to secure when you include cloud management.
Solution: How to avoid cloud security risks
While the public cloud allows you to make some dramatic mistakes, it is, in general, more secure than most on-premises environments. As mentioned in this research by Network World, cloud architectures are supported by expert architects who spend their time maximizing the data privacy and data resiliency of their services. A cloud service can offer far more 9s than most companies can on-premises.
While there are concerns for misconfigurations, the public clouds all have many security features to mitigate risks. Frequently, standard best practices are overlooked for the sake of development velocity. The major clouds all enable encryption at rest for storage of all types, with either user or platform-managed certificates. Additionally, multiple layers of network security are in place that allow you to manage network traffic in a granular manner and segregate public and private network traffic.
Tools like encrypting data at rest and securing secrets in code are much easier in the public cloud. The cloud providers also provide security dashboards that will scan your environment against best practices. Such tools evolve over time. These dashboards typically use a freemium model, where baseline reporting is included in your cloud service, and proactive correction is part of the paid services. For most services, the additional costs are relatively small, and the added value is high.
Challenge: Performance concerns
Another concern about the public cloud comes in the form of performance expectations. The most common problem is storage access times. For applications with significant input and output operations per second (IOPS)—requirements for online transaction processing databases (OLTP)—even minor throughput variations can cause an outage. This gets even trickier when the application is also sensitive to shifts in latency. Any applications regularly causing high spikes in throughput or bandwidth compute nodes and storage can be a valid concern for cloud migration.
A crafty cloud architect may consider switching from shared storage to local storage as they migrate from the cloud. While the "local" storage configuration may meet the needs of the specific application workloads, the changes in architecture may be prohibitive in new and unexpected ways. Unlike networked storage in the cloud, the local storage options do not have data persistence. To use these options for a database workload, you would need to ensure that there are at least three copies of your data across cloud data centers so that it is persistent in the event of a provider failure. That much duplication with limited control over local storage can balloon costs. In these cases, you may consider leaving those systems on-premises, where you may be able to run them more cost-effectively.
How to avoid performance concerns
Performance is always a concern with IT systems. The most important approach you can take is to quantify performance with metrics like CPU utilization, latency, and other application-specific measurements. Having these metrics provides a baseline for your systems and allows you to quantify your performance. It is important to test with production workloads before you execute your migration. I've been the victim of application teams who said they did testing, but when we migrated to production, latency skyrocketed. The team scrambled to solve the performance issue. Fortunately, it is very easy to get more hardware resources in the cloud. Testing and having this data is key to a successful cloud migration effort.
When you hear someone say, "cloud can't perform like our hardware," that perception is most likely due to past and outdated experiences. In the early days of the cloud, the performance profile of VMs was tiny. Now, you can deploy VMs with 416 CPUs and 12 TB of RAM. Also, the early days of the public cloud offered storage with spinning-disk hard drives, which provided limited performance. In 2020, each cloud provider offers fast network-attached SSD storage in the range of 100,000-200,000 IOPs and locally-attached storage options that can supply millions of IOPs (remember, locally-attached storage may require architectural changes to your application). In my experience, networked storage can meet the needs of about 98% of applications.
Putting it all together
Cloud migration offers new challenges when managing IT architecture. The three challenges covered today involve cost, security, and performance. While the risks are important to acknowledge, all three are navigable with the right information and the right team of cloud engineers or cloud architects. In short, embrace OpEx spend, use the excellent cloud security tools available, and acknowledge how far cloud performance has come. By navigating each of these concerns, shifting to a hybrid cloud infrastructure will be a valuable way to evolve your company's architecture.