I once saw a tee shirt that read, "Go away or I'll replace you with a shell script." It's a metaphor for what many fear in today's business climate: doing more with less. After all, just like printing presses replaced book scribes, today's automation could replace many system administration tasks. But sysadmins, as change agents, should not fear automation. They should embrace it. Printing presses opened up a whole new book industry after all. IT automation is an opportunity for sysadmins to spend more time on high-value work and less time on rote tasks.

If automation is such a good idea, where is the best place to start? Each enterprise is unique and no single solution fits everywhere. Organizations that wish to implement automation need leadership from the top and a core team of sysadmins to choose the tooling and how to use it. I'll cover the scenario presented in my first post in this series about a growing enterprise with administration functions divided into different groups. These groups want to maximize system performance and reliability, but differences in tooling limit what they can do.

As with my previous entry, I'll focus on Ansible by Red Hat for automation and deployment but we also need to discuss version control systems (VCS) and some basic usage of them. A VCS allows multiple people to work on the same code base. Each member has their own copy of the working files. Once their changes are complete and tested they can check the changes into the master copy. Past revisions are kept, so any errors could be rolled back at any time.

An automation team must establish process around the use of a VCS. Understand and apply change review, commit messaging, conflict resolution, branching, release patterns and code promotion to the workflow. Study and adopt some Agile development practices. Scrum and Sprint allow different functional groups to work towards a common goal and break the workload down into digestible chunks. Pair Programming ensures that all team members are well versed in the new methods and tools.

Agile methods generally succeed best with a cross-functional, meritocratic team structure. You may have situations where senior admins are taking direction from operations on things like YAML coding and VCS usage, or an operating system admin is working alongside a web developer. Just roll with it. Keep in mind that the goal is not to devalue work experience or morph jobs, but rather to develop -- as a team -- a better way of doing things.  

I'll use Git as the VCS due to its popularity. Let's assume our team has already set up a local Git repository and all the members have read/write access. Since this example is designed to separate responsibilities, we need to define multiple users:

  • dc_tech: Provisions servers but does not have login rights to them.

  • web_dev: Provides the web page and makes sure it runs in development. Has full access to the development servers, but no access to the production servers.

  • ops_admin: Has full rights to all the servers, but is primarily concerned with deploying the production servers.  

The files for the whole series are available for download and comparison. The file layout for this edition is more granular since it is designed to separate responsibilities.  Rather than explain the reasoning behind why the main playbook is split in two, only to have both segments called by a third playbook, I will demonstrate with the following workflow examples. Here is the file layout by user responsibilities:

  • dc_tech responsible for:

    • dev_hosts: Inventory of development servers

    • prod_hosts: Inventory of production servers

  • web_dev responsible for:

    • website.yml: Playbook to deploy the web page

    • index.html: Web page to deploy

  • ops_admin responsible for:

    • server_config.yml: Playbook to setup Apache, Firewalld and SELinux all_tasks.yml: Playbook that runs server_config.yml then website.yml

Workflow: A new server

A new production server has been provisioned, but not configured as part of the web farm yet.  dc_tech checks the status of the working repository:

[dc_tech@teamserver example.com_webfarm]$ git pull /automation_team/example.com_webfarm/
From /automation_team/example.com_webfarm
* branch            HEAD       -> FETCH_HEAD
Already up-to-date.

Everything is up to date, dc_tech edits the inventory, commits the changes made to prod_hosts and pushes them to the team repo:

[dc_tech@teamserver example.com_webfarm]$ vim prod_hosts
[dc_tech@teamserver example.com_webfarm]$ git commit -m "added new server to prod_hosts" prod_hosts
[master 030a099] added new server to prod_hosts
1 file changed, 1 insertion(+), 1 deletion(-)

[dc_tech@teamserver example.com_webfarm]$ git push
[ .. SNIP .. ] 
To /automation_team/example.com_webfarm/
  762bd6e..030a099  master -> master

ops_admin pulls down the new changes and reviews the commit messages:

[ops_admin@teamserver example.com_webfarm]$ git pull /automation_team/example.com_webfarm/
[ .. SNIP .. ] 
prod_hosts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

[ops_admin@teamserver example.com_webfarm]$ git log
commit 030a09904545be573d1d487d8279c5f8dbf157da
Author: DC Tech <dc_tech@example.com>
Date:   Fri Sep 1 13:47:44 2017 -0400
   added new server to prod_hosts
[ .. SNIP .. ] 

Once the changes are reviewed, ops_admin deploys the changes out to the web farm:

[ops_admin@teamserver example.com_webfarm]$ ansible-playbook -i prod_hosts all_tasks.yml

PLAY [This will run all the playbooks in the project] 
**************************

TASK [Gathering Facts] 
*********************************************************
ok: [192.168.124.178]
ok: [192.168.124.232]
ok: [192.168.124.124]

[ .. SNIP .. ] 

TASK [Install latest httpd] 
****************************************************
ok: [192.168.124.178]
ok: [192.168.124.124]
changed: [192.168.124.232]

[ .. SNIP .. ] 

TASK [Enable httpd] 
************************************************************
ok: [192.168.124.124]
ok: [192.168.124.178]
changed: [192.168.124.232]

[ .. SNIP .. ] 

TASK [Deploy the web page] 
*****************************************************
ok: [192.168.124.178]
ok: [192.168.124.124]
changed: [192.168.124.232]

PLAY RECAP 
*********************************************************************
192.168.124.124            : ok=11   changed=0    unreachable=0    failed=0
192.168.124.178            : ok=11   changed=0    unreachable=0    failed=0
192.168.124.232            : ok=11   changed=3    unreachable=0    failed=0

And finally test the changes on the new node:

[ops_admin@teamserver example.com_webfarm]$ curl 192.168.124.232
Deployed with Ansible!

Since Ansible is idempotent, no matter how many times you run a playbook, the managed nodes will always end up in the same state. Review the above output and notice how nothing was changed on 192.168.124.124 and 192.168.124.178, but multiple changes were made to the new server 192.168.124.232

Workflow: A change is needed on a web site.  

web_dev checks the local working copy against the master:  

[web_dev@teamserver example.com_webfarm]$ git pull /automation_team/example.com_webfarm/
From /automation_team/example.com_webfarm
* branch            HEAD       -> FETCH_HEAD
Already up-to-date.

Everything is up to date, web_dev edits the web page then deploys it to the development servers and tests:

[web_dev@teamserver example.com_webfarm]$ vim index.html
[web_dev@teamserver example.com_webfarm]$ ansible-playbook -i dev_hosts website.yml

PLAY [Enable Webservers] 
*******************************************************
[ .. SNIP .. ]

[web_dev@teamserver example.com_webfarm]$ curl 192.168.99.124
teamwork++ 
Deployed by the Automation team with Ansible! 

The change passes the test on the development servers, web_dev commits the changes and pushes them to the team repo:

[web_dev@teamserver example.com_webfarm]$ git commit -m "updated webpage" index.html
[ .. SNIP .. ]
To /automation_team/example.com_webfarm/
  030a099..c8640ed  master -> master

[ops_admin@teamserver example.com_webfarm]$ git push
[ .. SNIP .. ] 
To /automation_team/example.com_webfarm/
  030a099..c8640ed  master -> master

ops_admin pulls the changes, reviews and applies to the web farm:

[ops_admin@teamserver example.com_webfarm]$ git pull
[ .. SNIP .. ] 
index.html | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

[ops_admin@teamserver example.com_webfarm]$ git log
commit c8640ed405baf8e4f4fd6669ff7cbc402f320cb5
Author: Web Dev <web_dev@example.com>
Date:   Fri Sep 1 13:54:31 2017 -0400
   updated webpage
[ .. SNIP .. ] 

[ops_admin@teamserver example.com_webfarm]$ ansible-playbook -i prod_hosts website.yml

PLAY [Enable Webservers] 
*******************************************************

TASK [Gathering Facts] 
*********************************************************
ok: [192.168.124.232]
ok: [192.168.124.178]
ok: [192.168.124.124]

TASK [Deploy the web page] 
*****************************************************
changed: [192.168.124.232]
changed: [192.168.124.178]
changed: [192.168.124.124]

PLAY RECAP 
*********************************************************************
192.168.124.124            : ok=2    changed=1    unreachable=0    failed=0
192.168.124.178            : ok=2    changed=1    unreachable=0    failed=0
192.168.124.232            : ok=2    changed=1    unreachable=0    failed=0

And finally ops_admin tests a sample of the managed nodes, note the updated content of the web page:

[ops_admin@teamserver example.com_webfarm]$ curl 192.168.124.124
     teamwork++
     Deployed by the Automation team with Ansible!
[ops_admin@teamserver example.com_webfarm]$ curl 192.168.124.232
     teamwork++
     Deployed by the Automation team with Ansible!

Even though these examples are relatively simple, they show how quickly a team based automation framework can be set up. It may seem like overkill to use a VCS and Ansible for these small tasks, but the benefits can increase as you scale up. For the above examples, the effort would be the same whether you had 1 or 1,000 hosts.  

This is the second part of a series, in the future I plan to cover Roles, Modules, Templates, Ansible Galaxy and Ansible Tower.  

 

timq_blog_profile.pngTim Quinlan (RHCE, RHCVA) is a Technical Account Manager (TAM) in the US Central region. Since 1996, he has applied Linux geekery in a wide array of industries including retail, energy, rail control and manufacturing before coming to Red Hat.   

A Red Hat Technical Account Manager is a specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world-class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.

Connect with TAMs at a Red Hat Convergence event near you! Red Hat Convergence is a free, invitation-only event offering technical users an opportunity to deepen their Red Hat product knowledge and discover new ways to apply open source technology to meet their business goals. These events travel to cities around the world to provide you with a convenient, local one-day experience to learn and connect with Red Hat experts and industry peers.