Ansible is a simple and powerful open source automation tool that can streamline many of your IT infrastructure operations. You can automate simple tasks like installing packages, or complex workflows such as deploying a clustered solution with multiple nodes or patching your operating system with many steps. Whether the workflows are simple or complex, you need to integrate appropriate optimization techniques into the Ansible playbook content.
This article covers some of the major optimization methods available in Ansible for speeding up playbook execution.
1. Identify slow tasks with callback plugins
A specific task in a playbook might look simple, but it can be why the playbook is executing slowly. You can enable callback plugins such as
profile_roles to find a task's time consumption and identify which jobs are slowing down your plays.
ansible.cfg with the plugins:
[defaults] inventory = ./hosts callbacks_enabled = timer, profile_tasks, profile_roles
Now execute the
$ ansible-playbook site.yml PLAY [Deploying Web Server] ************ TASK [Gathering Facts] ********************** Thursday 23 December 2021 22:55:58 +0800 (0:00:00.055) 0:00:00.055 Thursday 23 December 2021 22:55:58 +0800 (0:00:00.054) 0:00:00.054 ok: [node1] TASK [Deploy Web service] ******************* Thursday 23 December 2021 22:56:00 +0800 (0:00:01.603) 0:00:01.659 Thursday 23 December 2021 22:56:00 +0800 (0:00:01.603) 0:00:01.658 ...<output removed>... PLAY RECAP ********************************** node1: ok=9 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Playbook run took 0 days, 0 hours, 0 minutes, 14 seconds Thursday 23 December 2021 22:56:12 +0800 (0:00:00.541) 0:00:14.100 ***** =============================================================================== deploy-web-server : Install httpd and firewalld ------- 5.42s deploy-web-server : Git checkout ---------------------- 3.40s Gathering Facts --------------------------------------- 1.60s deploy-web-server : Enable and Run Firewalld ---------- 0.82s deploy-web-server : firewalld permitt httpd service --- 0.72s deploy-web-server : httpd enabled and running --------- 0.55s deploy-web-server : Set Hostname on Site -------------- 0.54s deploy-web-server : Delete content & directory -------- 0.52s deploy-web-server : Create directory ------------------ 0.41s Deploy Web service ------------------------------------ 0.04s Thursday 23 December 2021 22:56:12 +0800 (0:00:00.541) 0:00:14.099 ===================================================================== deploy-web-server ------------------------- 12.40s gather_facts ------------------------------- 1.60s include_role ------------------------------- 0.04s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ total ------------------------------------- 14.04s
The output details the time it took for each task, role, and so on. This information helps you identify which task takes more time than the others.
[ Download an excerpt of Jesse Keating's Mastering Ansible to learn more about putting automation to work. ]
2. Disable fact gathering
When a playbook executes, each play runs a hidden task, called gathering facts, using the
setup module. This gathers information about the remote node you're automating, and the details are available under the variable
ansible_facts. But if you're not using these details in your playbook anywhere, then this is a waste of time. You can disable this operation by setting
gather_facts: False in the play.
With gathering facts enabled:
$ time ansible-playbook site.yml PLAY [Deploying Web Server] ********************* TASK [Gathering Facts] ************************** ok: [node1] ...<output removed>... PLAY RECAP ************************************** node1: ok=9 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ansible-playbook site.yml 3.03s user 0.93s system 25% cpu 15.526 total
gather_facts: False disabling fact gathering, performance increases:
$ time ansible-playbook site.yml PLAY [Deploying Web Server] **************** ...<output removed>... PLAY RECAP ************************************** node1: ok=8 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 ansible-playbook site.yml 2.96s user 1.00s system 26% cpu 14.992 total
The more nodes you have, the more time you save by disabling fact gathering.
3. Configure parallelism
Ansible uses batches for task execution, which are controlled by a parameter called
forks. The default value for
forks is 5, which means Ansible executes a task on the first five hosts, waits for the task to complete, and then takes the next batch of five hosts, and so on. Once all hosts finish the task, Ansible moves to the next tasks with a batch of five hosts again.
You can increase the value of
ansible.cfg, enabling Ansible to execute a task on more hosts in parallel:
[defaults] inventory = ./hosts forks=50
You can also change the value of
forks dynamically while executing a playbook by using the
--forks option (
-f for short):
$ ansible-playbook site.yaml --forks 50
A word of warning: When Ansible works on multiple managed nodes, it uses more computing resources (CPU and memory). Based on your Ansible control node machine capacity, configure
forks appropriately and responsibly.
4. Configure SSH optimization
Establishing a secure shell (SSH) connection is a relatively slow process that runs in the background. The global execution time increases significantly when you have more tasks in a playbook and more managed nodes to execute the tasks.
You can use
ControlPersist features in
ansible.cfg (in the
ssh_connection section) to mitigate this issue.
- ControlMaster allows multiple simultaneous SSH sessions with a remote host to use a single network connection. This saves time on an SSH connection's initial processes because later SSH sessions use the first SSH connection for task execution.
- ControlPersist indicates how long the SSH keeps an idle connection open in the background. For example,
ControlPersist=60skeeps the connection idle for 60 seconds:
[ssh_connection] ssh_args = -o ControlMaster=auto -o ControlPersist=60s
5. Disable host key checking in a dynamic environment
By default, Ansible checks and verifies SSH host keys to safeguard against server spoofing and man-in-the-middle attacks. This also consumes time. If your environment contains immutable managed nodes (virtual machines or containers), then the key is different when the host is reinstalled or recreated. You can disable host key checking for such environments by adding the
host_key_checking parameter in your
ansible.cfg file and setting it to
[defaults] host_key_checking = False
I don't recommend this outside of a controlled environment. Make sure you have a clear understanding of the implications of this action before you use it in critical environments.
[ Explore Red Hat Ansible Automation Platform 2 in this interactive guide. ]
6. Use pipelining
When Ansible uses SSH, several SSH operations happen in the background for copying files, scripts, and other execution commands. You can reduce the number of SSH connections by enabling the pipelining parameter (it's disabled by default) in
# ansible.cfg pipelining = True
7. Use execution strategies
By default, Ansible waits for every host to finish a task before moving to the next task, which is called linear strategy.
If you don't have dependencies on tasks or managed nodes, you can change
free, which allows Ansible to execute tasks on managed hosts until the end of the play without waiting for other hosts to finish their tasks:
- hosts: production servers strategy: free tasks:
You can develop or use more strategy plugins as needed, such as Mitogen, which uses Python-based executions and connections.
8. Use async tasks
When a task executes, Ansible waits for it to complete before closing the connection to the managed node. This can become a bottleneck when you have tasks with longer execution times (such as disk backups, package installation, and so on) because it increases global execution time. If the following tasks do not depend on this long-running task, you can use the
async mode with an appropriate
poll interval to tell Ansible not to wait and proceed with the next tasks:
--- - name: Async Demo hosts: nodes tasks: - name: Initiate custom snapshot shell: "/opt/diskutils/snapshot.sh init" async: 120 # Maximum allowed time in Seconds poll: 05 # Polling Interval in Seconds
Optimization is a journey
The global execution time of Ansible playbooks relies on multiple configurations. You can do your infrastructure a favor by finding the best combination of configuration parameters for your needs.
This isn't a complete list, of course. You can use many other parameters to control and optimize Ansible playbook execution, such as
run_once, and more. Refer to the documentation to learn more and apply the settings based on your Ansible environment.