Ansible-Blog-Tower-Workflow-Convergence

In Red Hat Ansible Tower 3.1 we released a feature called Workflows. The feature, effectively, allowed users to compose job templates into arbitrary graph trees. A simple workflow we saw users creating was a linear pipeline; similar to the workflow below.

image4-1

The workflow feature also allowed branching. Each branch can run in parallel.

image1-2

But something was missing. The ability to wait for previous parallel operations to finish before proceeding. If this existed, you could simplify the above workflow (see below).

image3-2

In Red Hat Ansible Tower 3.4 the above workflow is now possible with the introduction of the Workflow Convergence feature.

For you computer sciencey folks, workflows are no longer restricted to a tree, you can create a DAG. More simply, we call this convergence; two nodes are allowed to point to the same downstream node. The concept is best shown through an example. Above, we have a workflow with 3 nodes. The first two job templates run in parallel. When they both finish the 3rd downstream, convergence node, will trigger.

In this blog post we will cover the changes to workflow failure scenarios, how workflow node failure and success propagate, how this affects the runtime graph and how to create a workflow that responds to ALL failures rather than ANY failures. After reading this blog post you will have an understanding for how the Workflows feature chooses to execute the graph and how to, effectively, alter the ANY vs ALL scenario using the utility playbook method.

Workflow Execution Scenario

Prior to the release of Ansible Tower 3.4 a workflow failure or success was determined by the final node’s resulting status. The workflow would get marked fail if either Install APP jobs failed.

image1-2

Now workflow success or failure works more like exception handling. If 1 or more jobs spawned from a workflow results in a failure WITHOUT a failure handler, then the workflow is marked failed; else it’s marked success. A failure handler is an always or failure path emanating from a node. Below is an example of how a common failure scenario would be handled using a workflow.

image6

The above workflow creates an EC2 instance and deploys our application code. If either of these two jobs fail, then the EC2 instance will be cleaned up by the Delete EC2 Instance job. The workflow job is not marked as failed, because the Delete EC2 Instance is the failure handler. If the Delete EC2 Instance job fails, the workflow will be marked failed and the user knows that manual intervention is required to inspect why the cleanup process fails and to manually delete the, effectively orphaned, EC2 instance.

image3-2

Now let’s look at a slightly more complex example. In the above workflow, our intent is to run the "Install App" Job Template ONLY if both instance creation jobs succeed. However, the workflow doesn’t work like that. Instead, the "Install App" Job Template will run if either parents succeed. In the next section I will show you how to get the wanted AND behavior instead of OR.

AND vs. OR

Let’s take a look at what I’ve said above, but in more general terms. Consider child nodes with multiple parents connected by a success relationship. If any one of the parents succeed, the child node will run. Sometimes this is the wanted behavior, sometimes though you may want to run a child node only if all parents succeed. So how do we do that? To accomplish this we will create a utility Job Template that we put before the convergence node that we want to make an ALL case (i.e. before the Install App in the previous example).

AND Utility Job Template

image2-3

Again, our goal is to have Install App run only if both “Create GCE Instance” and “Create EC2 Instance” succeed. To accomplish this goal we have created a new Job Template named “Utility All” and replaced our previous convergence node, “Install App”, with “Utility All”. The playbook associated with “Utility All” is shown below. The playbook gets the parent jobs, loops over them and if any of the parent jobs fail then the playbook itself fails. This achieves our wanted behavior of only running “Install App” when all parent nodes succeed.

# and_util.py
---
- hosts: localhost
  gather_facts: false
  vars:
    this_playbook_should_fail: false
    job_id: "{{ lookup('env', 'JOB_ID') }}"
    tower_base_url: "https://{{ lookup('env', 'TOWER_HOST') }}/api/v2"
    tower_username: "{{ lookup('env', 'TOWER_USERNAME') }}"
    tower_password: "{{ lookup('env', 'TOWER_PASSWORD') }}"
    tower_verify_ssl: "{{ lookup('env', 'TOWER_VERIFY_SSL') }}"
  tasks:
    - name: "Get Workflow job id for which this job belongs"
      shell: tower-cli job get {{ job_id }} -f json | jq ".related.source_workflow_job" | sed 's/\/"$//' | sed 's/.*\///'
      register: workflow_job_id

    - name: "Get Workflow node id for this job"
      uri:
        url: "{{ tower_base_url }}/workflow_job_nodes/?job_id={{ job_id }}"
        validate_certs: "{{ tower_verify_ssl }}"
        force_basic_auth: true
        user: "{{ tower_username }}"
        password: "{{ tower_password }}"
      register: result

    - name: "Get parent workflow nodes for this workflow node"
      uri:
        url: "{{ tower_base_url }}/workflow_job_nodes/?success_nodes={{ result.json.results[0].id }}"
        validate_certs: "{{ tower_verify_ssl }}"
        force_basic_auth: true
        user: "{{ tower_username }}"
        password: "{{ tower_password }}"
      register: result

    - name: "Fail this playbook if a parent node failed"
      fail:
        msg: "Parent workflow node {{ item }} failed"
      when: "item.summary_fields.job.status == 'failed'"
      loop: "{{ result.json.results }}"

image5-1

Conclusion

In our own testing we found this Workflow Convergence feature mapped better to actual working practices, so we hope you find it as useful for your own needs. In this blog post we have gone over how workflow failure scenarios have changed to accommodate the new convergence node feature, how the run-time graph is created, and how to use and change the default workflow convergence method. I invite you to check out the other workflow enhancements we added in Ansible Tower.


저자 소개

Chris is a Principal Software Engineer, Ansible, contributing Red Hat Ansible Tower backend APIs. Outside of work Chris hones his skills as an amateur carpenter on his house. To learn more about those you can follow him on Twitter at @oldmanmeyers85.
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래