In today’s IT environment, enterprise applications can be complex, scalable, distributed, component-based, and often are mission-critical. They may be deployed on a variety of platforms across private cloud, public cloud, or hybrid cloud. They may access sensitive data, they may be subject to regulatory guidelines and stringent security policies, and yet need to be as user friendly as possible too. In short, these applications are highly complex. In this post we'll talk about how those considerations mesh with using automation tools like Ansible.
In my previous post, Adventures with Ansible: Lessons learned from real-world deployments, I walked through practical considerations to successfully use Ansible in your environment. In today’s post, I dive deep into what I see as best practices gleaned from my experience in the field with real-world deployments.
Designing and developing enterprise applications means satisfying any number of separate requirements. What’s more, development decisions you make to satisfy a requirement may affect other requirements, often in ways that are difficult to understand or predict. The failure to meet these requirements could mean the failure of a project. Some of the enterprise applications you might find nowadays:
Automated Billing Systems (ABS)
Email marketing systems
Content Management Systems (CMS)
Call center and customer support
Customer Relationship Management (CRM)
Enterprise Resource Planning (ERP)
Business Continuity Planning (BCP)
Enterprise Application Integration (EAI)
Messaging and collaboration systems
As a result of the real world complexity of these applications, organizations have given these reasons (among others) to me as to why they haven't looked to automate their enterprise applications. I’ve heard the following:
It cannot be automated!
It wasn’t originally meant to be automated.
It is too complex and too costly to automate.
It is already automated using shell scripts and various other tools and tricks.
We have no pain currently within our current operations.
Our IT hero handles any problems with the enterprise product.
Perhaps they are also thinking about the fact that this is more than a technical challenge. Let’s be honest, automation is not only about the technology; it is a different way of thinking. It demands a cultural change.
When in Germany on a customer engagement, I was given the opportunity to lead an automation team with the goal of automating their software stack deployments across 10 different environments using Ansible, Jenkins and Vagrant. They had been doing deployments to production every three months and experienced long downtimes, unexpected issues during deployment, and more. The pressure and complexity of delivering a release at midnight with 10 experts residing onsite just didn’t make sense anymore in the modern IT world.
They wanted an automated solution! They wanted to reduce costs, increase reliability, standardize their environments, eliminate repetitive manual tasks, enable their teams to spend more time on improving their systems versus managing their systems, and so on.
Overall, the project was a success. However, it wasn’t an easy journey. Here is what happened…
So let’s start with some of the problems that existed...
We were deploying everything manually, which created some of the following challenges:
Production deployments occurred every three months and could take up to 8 hours to complete. They were performed in the middle of the night, and required the help of 5 to 10 different "heroes."
Each hero had his/her own custom scripts to perform the various tasks necessary to get each release working in each environment.
It was a huge organizational challenge, and every deployment had its own unique issues.
Reliability - Sometimes, applications would not start (due to incorrect shutdown sequence). Everyone had his or her own tricks on how to start it back up. Some apps were also not configured the same on every environment, further compromising reliability
Human interaction - Deployments often required human interaction to perform an action on the website to properly deploy.
Inconsistent Environments (a.k.a. The Human Factor)
At this time, our environments simply were inconsistent. Deployments were manually performed, and somehow we were always able to make it successful. Some folks had special scripts, special folders on target servers, etc. Overall, this would result in errors occurring on Prod that did not occur on previous environments. It is a painful and costly issue when your production environment is down for longer than you had expected.
This was basically the "human factor."
Before automation, the current culture was basically using waterfall techniques. We deployed very seldom to our production environment.
It all started with Agile methodologies and Scrum teams were developed. Then we created an Automation Team within our Ops department to drive the automation of our deployments and reduce their manual complexities and inconsistencies. Overall this process from waterfall to agile took over 2 years and still can be improved. The process simply continues to grow and change and that’s the point of agile.
We implemented a better way.
Automation software, Agile methodologies, Scrum teams from both Dev and Ops departments, an Automation Team focused on delivering automated tools, and a daily process of moving our company culture to a new way of thinking was all the pieces needed to ultimately provide an elegant solution and greatly reduce the pain around deployments.
It took a lot of time and a lot of work from everyone involved.
Here is an overview of some of the tools we used to automate everything in our environment:
First and foremost, you need to have a central repository for your artifacts. The development team most likely is already using one to store the binary artifacts that they build.
Jenkins is your CI/CD pipeline automation tool. Most likely your development team is already using Jenkins or something similar for automated builds. However, for deployment automation we decided to create a separate instance of Jenkins since we wanted to manage the firewalls differently.
Pipelines were created to orchestrate the phases of our deployment process: prepare, shutdown, deploy and startup. We created our pipelines using Groovy code (object oriented classes, functions, etc) so not only do we develop standards (reusable code) but if Jenkins crashes or everything falls apart, we can rebuild very quickly using our code.
Remember, automate even the automation tools! So we also automated the installation of Jenkins . We additionally installed a scaled-down Jenkins on our Control Node in order to test the pipelines locally before we checked-in Groovy code or Ansible code. It was installed with the exact same Jenkins plugin versions as what was running in our production Jenkins cluster.
Ansible was at the core of everything we were doing.
We moved all of our configuration into Ansible Inventory using a multi-stage inventory structure (handling Dev1, Dev2, Dev3, Test1, Test2, Test3, etc…) as well as shared variables and
group_vars. It took a lot of time to migrate the application configuration that originally resided within the built artifacts that Dev team built using their Jenkins instance. Credentials were stored using Ansible Vault.
Our software deployments could be broken down into the following phases: prepare, shutdown, deploy, and startup. As a result, we developed our Ansible Roles with the
tasks/main.yml entry point handling these phases using Tags. See below for an example. This allowed us to run a Jenkins pipeline to run all of our Ansible Roles and just do the
prepare phase, which is responsible for pushing the new artifact to the target servers but not affecting any running services. This saved us a lot of prep time, which does not require any downtime.
--- - name: Prepare artifact import_tasks: prepare.yml tags: prepare - name: Shutdown services import_tasks: shutdown.yml tags: shutdown …
We used Vagrant and Virtualbox to provision a Control Node that has a scaled-down installation similar to our Jenkins server where our Ansible playbooks are executed from. This was important because we needed a way to test and debug automation issues.
The local vagrant box was provisioned with the same Ansible version, Jenkins version, Jenkins plugins, OS packages, volume names as our main Jenkins master node. Of course it was scaled down to using 1GB memory and 1 vCPU but it was enough for testing purposes. Additionally Vagrant was configured with boxes that emulated our production stack. This allowed us to run our playbooks from the Control Node to provision and deploy to one or more target hosts (web servers, application servers, etc).
Subversion was our version control system (VCS) at the company. Initially we sought to change this, but Subversion was used across the entire company. It would have been a big, disruptive change. Using Git for only our team would have created a new silo and added another layer to the stack. I believe in keeping it simple, and we never hit a problem that justified forcing a switch from Subversion to Git. Sometimes it's better to know when to rip and replace and when to stick with a technology, even if it's not the latest cool tool.
What is the Agile methodology and Scrum teams without Jira? Plan, track, and release software using the very useful tracking software by Atlassian. One of the biggest challenges with our automation transformation was defining User Stories in JIRA and understanding MVP (Minimum Viable Product).
No change happens without some challenges and our challenges spanned all possible areas.
At the time, applications were dependent on Internet Explorer and using ActiveX technology. To get around that we used PhantomJS to automate it using a headless browser.
Both Dev and Ops were interested in doing DevOps but it was difficult to get alignment of the priorities set by the Dev and Ops departments with respect to the DevOps movement. Both departments had different leaders with different overall goals.
Some folks on the team were not interested in automation at all. At least in the beginning. The attitude was "it is already automated." Of course it is important that your company spends time to define automation. Many folks see automation as perhaps yet another language they need to learn. Through the use of a lot of patience and simply "planting the seeds" and showing small success stories we were able to convince them on using Ansible automation.
What did we accomplish? Overall, it was an amazing success story that needed to be shared (hence, this blog). We went from zero automation to a very successful team that now deploys every two weeks to production using a single pipeline that needs almost no human interaction or expertise to operate.
Some of the successes and outcomes we experienced include a standardized procedure for preparation, startup, deploy, and shutdown across 10 environments. So far it has been reliable and started cleanly every time, and the deployment time was shaved from two hours to 20 minutes.
By automating the pipeline, we've removed the need for manual operation to make deploys happen and the errors that go along with that. Even with robust documentation and practiced IT staff, there's a lot of margin for error when you're depending on people to step through a series of commands and operations manually. Get one out of order, or skip a step, and things can go sideways quickly.
Our results speak for themselves, but in addition to business outcomes we also learned quite a bit in the process.
Be aware that not everyone wants to automate, at least not in the beginning. You're going to have to use something you won't find with Ansible or any automation tool, patience. You're going to need patience, and a lot of it, to embark on a major automation project.
Start small, and automate something that will give you a win that you can show off as a success story. You want to solve a pain point that people have, but don't try to jump right in on the biggest problem right away without experience under your belt or any buy-in around automation. A misfire early on could kill efforts to automate and cause people to dig in with the idea that automation isn't possible in their environment.
Like many other things in IT, 80% of the problems you'll face are going to be reasonably straightforward to solve. It's the other 20% that will really challenge you when automating, because of corner cases, exception handling, and other unusual situations that may crop up. But these are usually challenging, not impossible.
For seasoned system administrators and ops folks, the temptation to simply SSH into a host to fix a problem is strong. Resist that urge! The goal is to have consistent, predictable, standardized and reliable systems. So figure out how to change (or create) Ansible playbooks so that you do not need to SSH into the box. Fix the source of your problems, not just the symptoms.
Lastly, you're not done until you automate your automation. You want to automatically install and upgrade Ansible, Jenkins, Jenkins plugins, and so forth. Do not manually install these or you are creating the same problem that you have been fixing with Ansible on the target hosts. You will have unreliable and inconsistent environments for your automation tools. Not a good foundation!
Throughout my career I was always automating everything I could. I hated doing something twice manually. The problem was that I was using the wrong tools.
Finally automating the software that I had been installing, configuring, deploying and patching for over 10 years was an incredible eye-opening experience to the full capabilities of configuration management, CI/CD, pipelines, idempotency, etc. And Ansible was at the core of it all.
So don’t be afraid anymore! Go automate your enterprise applications - start today! For more information on how Red Hat Services can help up-level your automated enterprise, take a look at our consulting offerings.