Introduction
Red Hat Ansible Automation Platform has seen wide-scale adoption in a variety of automation domains, however with edge use cases becoming more mainstream, the thought process around automation must shift from “complete a task immediately” to being able to run automation now and later, and respond to incoming automation requests from devices that are yet unmanaged.
Automation in a hybrid cloud environment
In today’s hybrid cloud environment, automation exists in a tightly controlled and predictable space, meaning it’s easy to determine what endpoints are reachable and available for connection. In practice, this manifests as inventory syncs from our various management planes (think AWS/Azure/GCP/VMware) and then targeting the devices brought into Controller via those inventory syncs with automation. Cross connectivity shouldn’t be an issue: If we can see the device in a management plane, we can contact and automate against it. In addition, if there are exceptions to the “connectivity everywhere” model, Red Hat Ansible Automation Platform has features and functionality to help address more complex connectivity circumstances.
We can even take this automation approach one step further by pulling those management planes under the management of our automation, giving us the ability to really automate end-to-end. For example, if we want to automate against an endpoint that doesn’t yet exist, we can instruct the appropriate management plane and related systems to bring that endpoint into existence, then perform our automation against it.
What this boils down to is: We have significant control over whether our automation works, because we control both the endpoints and what controls the endpoints. Our automation exists in a proactive paradigm, able to automate and coordinate against all layers of the technology stack.
This approach, while very powerful, doesn’t hold up at the edge for a simple reason: We don’t necessarily control all the layers of the stack. The endpoints are run without a unified management plane and exist on different platforms, in different locations, and are subject to different rules around connectivity. Thus, the paradigm shift: In this environment we must still automate, but we need to balance both a reactive and proactive approach.
Automation at the edge
Without direct control over the various management planes and layers of the technology stack, our automation strategy must shift to be both proactive and reactive, responding in real time to changes in the environment, but also run automation on a schedule to enforce our desired configurations and workloads across a geographically distributed footprint of edge endpoints.
Reactive automation
First, our automation strategy must be reactive to respond to changes in the environment in real time. One of the most common events at the edge is the onboarding and re-imaging of systems, initiated by a process or person outside of the automation ecosystem. In reality, this is outside of the direct automation itself, instead being a function of the automation platform. Providing the appropriate access to resources within the platform at any time, from any allowed location, without requiring manual intervention being the main function of the platform. From there, the platform also facilities a form of event-driven automation, triggering work to be done as new devices come online.
Caption: Here, the device initiates the process, as opposed to the automation platform initiating it.
In practice, this is typically addressed by adding a call home function to the deployment methodology, allowing the devices themselves to initiate interaction with the automation platform according to their availability and schedule over a centralized attempt to reach out. This call home functionality typically sends along two key pieces of information: a unique identifier for the device, and a way to contact it to begin automating against it.
The following is an example of a kickstart post file which leverages systemd
to perform call home functionality via an Ansible Playbook run locally upon first boot. The playbook itself handles determining a unique identifier for the system (based off a Mac address in this example), adding the device to an inventory, and also defining Ansible’s magic variable of ansible_host
so automation runs know how to contact the system:
%post
# create playbook for controller registration
cat > /var/tmp/aap-auto-registration.yml <<EOF
---
- name: register a r4e system to ansible controller
hosts:
- localhost
vars:
ansible_connection: local
controller_url: https://ansible-controller.company.local/api/v2
controller_inventory: Edge Systems
provisioning_template: Post-Install Edge System
module_defaults:
ansible.builtin.uri:
user: edge-callhome-user
Password: edge-callhome-password
force_basic_auth: yes
validate_certs: no
tasks:
- name: find the id of {{ controller_inventory }}
ansible.builtin.uri:
url: "{{ controller_url }}/inventories?name={{ controller_inventory | regex_replace(' ', '%20') }}"
register: controller_inventory_lookup
- name: set inventory id fact
ansible.builtin.set_fact:
controller_inventory_id: "{{ controller_inventory_lookup.json.results[0].id }}"
- name: create host in inventory {{ controller_inventory }}
ansible.builtin.uri:
url: "{{ controller_url }}/inventories/{{ controller_inventory_id }}/hosts/"
method: POST
body_format: json
body:
name: "edge-{{ ansible_default_ipv4.macaddress | replace(':','') }}"
variables:
'ansible_host: {{ ansible_default_ipv4.address }}'
register: create_host
changed_when:
- create_host.status | int == 201
failed_when:
- create_host.status | int != 201
- "'already exists' not in create_host.content"
- name: find the id of {{ provisioning_template }}
ansible.builtin.uri:
url: "{{ controller_url }}/workflow_job_templates?name={{ provisioning_template | regex_replace(' ', '%20') }}"
register: job_template_lookup
- name: set the id of {{ provisioning_template }}
ansible.builtin.set_fact:
job_template_id: "{{ job_template_lookup.json.results[0].id }}"
- name: trigger {{ provisioning_template }}
ansible.builtin.uri:
url: "{{ controller_url }}/workflow_job_templates/{{ job_template_id }}/launch/"
method: POST
status_code:
- 201
body_format: json
body:
limit: "edge-{{ ansible_default_ipv4.macaddress | replace(':','') }}"
EOF
# create systemd runonce file to trigger playbook
cat > /etc/systemd/system/aap-auto-registration.service <<EOF
[Unit]
Description=Ansible Automation Platform Auto-Registration
After=local-fs.target
After=network.target
ConditionPathExists=!/var/tmp/post-installed
[Service]
ExecStartPre=/usr/bin/sleep 20
ExecStart=/usr/bin/ansible-playbook /var/tmp/aap-auto-registration.yml
ExecStartPost=/usr/bin/touch /var/tmp/post-installed
User=root
RemainAfterExit=true
Type=oneshot
[Install]
WantedBy=multi-user.target
EOF
# Enable the service
systemctl enable aap-auto-registration.service
%end
In addition to adding the device to an inventory in the automation controller, this playbook also triggers a workflow run, limited to just the device itself. This becomes a kind of event-driven automation, where the event is a new device being installed, and the automation being triggered is the standard configuration and deployment of a workload to the device.
Proactive automation
This is closer to our hybrid cloud automation paradigm above, where we either have some knowledge of the endpoints (which is being handled by our reactive automation approach) or know where we can get information before attempting to run automation. At the edge, we want to keep our devices compliant with various configuration and security policies to ensure they’re functional, performant, and secure, however we’re not always confident we’ll be able to contact them immediately due to connectivity disruptions or site issues.
There’s three tools in the automation tool belt to help us here: automation mesh to help address connectivity challenges (same as above), schedules in the automation controller, and modifying our playbooks to have some level of built-in error handling via blocks and wait for connection. Schedules allow the automation controller to run our automation repeatedly, hitting the devices available at the time, and enforcing the desired configuration and security settings. If a device is unavailable for whatever reason, whether there’s a connectivity disruption, conflicting or scheduled maintenance, the automation controller will denote the failure and try again at the next scheduled time, allowing for repeated enforcement against devices as they’re available.
Finally, playbooks can be written to allow for interruptions in connectivity, or to wait for a device to become available before attempting to continue with other tasks. These constructs, such as the rescue functionality of a block, or the wait_for_connection
module, help reduce errors caused by connectivity interruptions or poor network connectivity. Since these are deployed at the playbook level, they can wrap around specific long-running or troublesome tasks that involve moving data across a very slow network.
Conclusion
The edge is driving a paradigm shift in automation, from a primary proactive approach to a combination of proactive and reactive. This paradigm shift does drive some changes in the approach to automation, but also reinforces the need for automation, and allows for interoperability between centralized and decentralized systems.
***This blog was co-authored by Josh Swanson
About the author
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit