Have you ever gotten to the end of your Ansible Playbook execution and found you needed to:
- Rescue from errors or partial execution of your tasks
- Capture a summary of the results per host for further revision
If you have Ansible Automation Platform (AAP), you can use the techniques I described in How to use workflow job templates in Ansible to handle #1, and you will need something like the set_stats module for #2 (to be able to persist variables between workflow nodes).
In this article, I will cover the block/rescue feature in Ansible. You can incorporate it into your playbooks, whether or not you have AAP. You can then take these playbooks and run then in your AAP instance.
[ Compare Ansible vs. Red Hat Ansible Automation Platform. ]
By learning this new technique, you can use the approach that suits you best.
What is a block?
A block is a logical grouping of tasks within a playbook that can be executed as a single unit. This makes it easy to manage complex playbooks by breaking them down into smaller, more manageable parts.
You can use blocks to apply options to a group of tasks and avoid repeating code, like in this example from the documentation.
tasks:
- name: Install, configure, and start Apache
block:
- name: Install httpd and memcached
ansible.builtin.yum:
name:
- httpd
- memcached
state: present
- name: Apply the foo config template
ansible.builtin.template:
src: templates/src.j2
dest: /etc/foo.conf
- name: Start service bar and enable it
ansible.builtin.service:
name: bar
state: started
enabled: True
when: ansible_facts['distribution'] == 'CentOS'
become: true
become_user: root
ignore_errors: true
Notice that the keywords when, become, become_user, and ignore_errors are all applied to the block.
[ Write your first Ansible Playbook in this hands-on interactive lab. ]
How to use blocks and rescue in Ansible
Blocks and rescue work together to provide error-handling capabilities in Ansible. Use the rescue keyword in association with a block to define a set of tasks that will be executed if an error occurs in the block. You can use the rescue tasks to handle errors, log messages, or take other actions to recover from the error.
Here is a high-level example:
---
- hosts: <hosts>
tasks:
- block:
- <task1>
- <task2>
- <task3>
rescue:
- <rescue_task1>
- <rescue_task2>
- <rescue_task3>
always:
- <always_task>
You define tasks under the block keyword, which could be as simple as invoking the ansible.builtin.ping module, or you could have a combination of multiple tasks and including/importing roles.
The associated rescue keyword is where the playbook execution will be sent, for each host, if anything fails along the block.
Finally, the always section executes for all nodes, no matter if they succeed or fail.
Some key ideas from this structure:
- rescue and always are optional features, which I will use for the specific purpose of demonstrating this "recover and summary" logic.
- When your playbook runs against a considerable number of hosts, handling the individual results becomes harder to track. This is how the ideas discussed in this article can help.
For the following example, the inventory file contains:
[nodes]
node1
node2
node3
Here is the playbook:
---
- name: Test block/rescue
hosts: nodes
gather_facts: false
tasks:
- name: Main block
block:
- name: Role 1
ansible.builtin.include_role:
name: role1
- name: Role 2
ansible.builtin.include_role:
name: role2
- name: Accumulate success
ansible.builtin.set_fact:
_result:
host: "{{ inventory_hostname }}"
status: "OK"
interfaces: "{{ ansible_facts['interfaces'] }}"
rescue:
- name: Accumulate failure
ansible.builtin.set_fact:
_result:
host: "{{ inventory_hostname }}"
status: "FAIL"
always:
- name: Tasks that will always run after the main block
block:
- name: Collect results
ansible.builtin.set_fact:
_global_result: "{{ (_global_result | default([])) + [hostvars[item]['_result']] }}"
loop: "{{ ansible_play_hosts }}"
- name: Classify results
ansible.builtin.set_fact:
_result_ok: "{{ _global_result | selectattr('status', 'equalto', 'OK') | list }}"
_result_fail: "{{ _global_result | selectattr('status', 'equalto', 'FAIL') | list }}"
- name: Display results OK
ansible.builtin.debug:
msg: "{{ _result_ok }}"
when: (_result_ok | length ) > 0
- name: Display results FAIL
ansible.builtin.debug:
msg: "{{ _result_fail }}"
when: (_result_fail | length ) > 0
delegate_to: localhost
run_once: true
...
Think about this playbook as an illustration of some logic that could be applied to a complex automation in the real world. Yes, you could run simpler actions to recover or issue a notification about the failure, but you want a summary of all results. Then you can use this summary in the always section to automate sending a notification by email or writing the individual results into a database.
Also, the variables starting with _ are my personal naming convention preference... there's no special meaning in Ansible for that.
- For this example, the roles in the main block don't do anything special. They represent the actions that you would put in the main block, which could fail at different points. In this simplified example, if a node succeeds, there will be a list of interfaces in the
_resultvariable. Otherwise, the status will be set to FAIL. - For each host the playbook is running on:
- If the actions proceed without errors, the task Accumulate success will execute.
- If the action fails in any of the roles, the flow goes to the rescue block for each host.
- The always section collects the results saved in the variable
_result. Here is a little breakdown of the logic:- Up to this point, each host has a variable in its hostvars structure, either with a success or failed status information.
- In the Collect results task, which runs once and is delegated to localhost, it captures the individual results and adds them to the list
_global_result. - The loop is done using the Ansible magic variable
ansible_play_hosts_all, which is a list of all hosts targeted by this playbook. - Classify results does some filtering to create a list of all OK and failed results. You can use these in notifications, reports, or to send to a database (this example just displays them).
If you run this playbook and no node fails, there is no need for rescue, and the display should show that results are OK in all nodes:
PLAY [Test block/rescue] *******************************************************
TASK [Role 1] ******************************************************************
TASK [role1 : Execution of role 1] *********************************************
ok: [node1] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [node2] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [node3] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Role 2] ******************************************************************
TASK [role2 : Execution of role 2] *********************************************
ok: [node1]
ok: [node2]
ok: [node3]
TASK [role2 : Show network information] ****************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]
TASK [Accumulate success] ******************************************************
ok: [node1]
ok: [node2]
ok: [node3]
TASK [Collect results] *********************************************************
ok: [node1 -> localhost] => (item=node1)
ok: [node1 -> localhost] => (item=node2)
ok: [node1 -> localhost] => (item=node3)
TASK [Classify results] ********************************************************
ok: [node1 -> localhost]
TASK [Display results OK] ******************************************************
ok: [node1 -> localhost] => {
"msg": [
{
"host": "node1",
"interfaces": [
"lo",
"enp7s0",
"enp1s0"
],
"status": "OK"
},
{
"host": "node2",
"interfaces": [
"lo",
"enp7s0",
"enp1s0"
],
"status": "OK"
},
{
"host": "node3",
"interfaces": [
"enp7s0",
"lo",
"enp1s0"
],
"status": "OK"
}
]
}
TASK [Display results FAIL] ****************************************************
skipping: [node1]
PLAY RECAP *********************************************************************
node1 : ok=6 changed=0 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
node2 : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
node3 : ok=3 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
If you force a failure in some nodes, they will invoke the rescue section, and the summary will show the ones that succeeded and those that failed:
PLAY [Test block/rescue] *******************************************************
TASK [Role 1] ******************************************************************
TASK [role1 : Execution of role 1] *********************************************
ok: [node1] => {
"changed": false,
"msg": "All assertions passed"
}
fatal: [node2]: FAILED! => {
"assertion": "inventory_hostname in nodes_ok",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
fatal: [node3]: FAILED! => {
"assertion": "inventory_hostname in nodes_ok",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
TASK [Role 2] ******************************************************************
TASK [role2 : Execution of role 2] *********************************************
ok: [node1]
TASK [role2 : Show network information] ****************************************
skipping: [node1]
TASK [Accumulate success] ******************************************************
ok: [node1]
TASK [Accumulate failure] ******************************************************
ok: [node2]
ok: [node3]
TASK [Collect results] *********************************************************
ok: [node1 -> localhost] => (item=node1)
ok: [node1 -> localhost] => (item=node2)
ok: [node1 -> localhost] => (item=node3)
TASK [Classify results] ********************************************************
ok: [node1 -> localhost]
TASK [Display results OK] ******************************************************
ok: [node1 -> localhost] => {
"msg": [
{
"host": "node1",
"interfaces": [
"enp7s0",
"enp1s0",
"lo"
],
"status": "OK"
}
]
}
TASK [Display results FAIL] ****************************************************
ok: [node1 -> localhost] => {
"msg": [
{
"host": "node2",
"status": "FAIL"
},
{
"host": "node3",
"status": "FAIL"
}
]
}
PLAY RECAP *********************************************************************
node1 : ok=7 changed=0 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
node2 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=1 ignored=0
node3 : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=1 ignored=0
Notice that even though there were failures, at the end, the Ansible output counts them as rescued.
[ Get started with automation controller in this hands-on interactive lab. ]
Handle exceptions
I hope this article has given you some ideas about how to handle exceptions in your playbooks.
You can also think about what actions you want in your rescue section, such as displaying a message or performing some "undo" action, depending on what stage it reached before the failure.
Finally, you can execute the always section for each host or, as in my example, one time only.
저자 소개
Roberto Nozaki (RHCSA/RHCE/RHCA) is an Automation Principal Consultant at Red Hat Canada where he specializes in IT automation with Ansible. He has experience in the financial, retail, and telecommunications sectors, having performed different roles in his career, from programming in mainframe environments to delivering IBM/Tivoli and Netcool products as a pre-sales and post-sales consultant.
Roberto has been a computer and software programming enthusiast for over 35 years. He is currently interested in hacking what he considers to be the ultimate hardware and software: our bodies and our minds.
Roberto lives in Toronto, and when he is not studying and working with Linux and Ansible, he likes to meditate, play the electric guitar, and research neuroscience, altered states of consciousness, biohacking, and spirituality.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래