Modern life is leading to a permanent state of online connectivity. If you’re reading this article from your cell phone, you already used AI. And it feels almost natural that as soon as you wake up, you open social media and answer messages. You might be checking emails while eating breakfast, and while you are driving to school or work, you may be listening to podcasts or Spotify. Once you connect to your computer you will enter into enterprise applications and services, including desktop applications that could be hosted as cloud services.
It’s all fun and joy, until you get a message like this:
Networking is like electricity: You will only think about it when it’s gone. It is fundamental for the modern world.
“First thing to know about any technology is that connectivity is the very first point of failure or success.”
Analysts now mention that network automation is a critical and high-risk activity. In this article we cover some insights about modern network operations, why it is risky to modernize network operations, and how to manage risk with a “Start small and think big” approach that we have successfully implemented in hundreds of organizations.
Why are the network operations slow to modernize?
Most of the IT groups have evolved or transformed how they operate, which started more than 20 years ago with virtualization. Now, we have containerization, and we are moving even faster. We have seen the public cloud wave, the private cloud, edge computing, and DevOps initiatives thriving.
So why are the network operations slow to modernize?
The criticality of networking in the modern world
Most, if not all, applications and systems need connectivity, including: access to payment systems, self-service stations, ATMs, mobile cell phones, streaming, social media, online orders, and emails. Data has to be transferred from one location to another.
All the data generated or consumed is transported in networks, and the type of traffic and network architecture might change. Traffic can start or end at an enterprise network, commonly known as campus, or it might go through a private virtual network, or through the Internet. Regardless of the type of interaction, your traffic will mostly traverse the internet at some point, which is a web of interconnected carrier network service providers.
Networks are ecosystems, and are getting even more complex
As such, it is difficult to predict what will happen when you impact an interface, line card, or device. Even simple changes, such as interfaces and links failures, could create major network congestion.
Regardless of redundant configurations and connections, the traffic in the network is not evenly distributed, and it's very hard to predict how the traffic is going and how it will go. In the same way, adding 20-30% of additional traffic can cause specific connections to start dropping packets due to congestion. Some major companies use network planning tools that can simulate traffic and topology changes based on constraints and use mathematical models, and we observe significant investments in AI for network operations. However, most of the network behavior is still hard to predict.
If we add the component that now most networks are multi-vendor, and teams might even have ongoing migrations in place, we have a combination of complexity that is hard to solve without automation.
Networks operations in general will be risk averse
Network operations are mostly managed with staggered changes, during maintenance windows at night, and deploying few configuration changes due to the risk. Regardless of how much effort you put forth, you could still inadvertently break things.
Network operations, in general, are risk averse because they know a single change can cause a major disruption. The burden of that responsibility will make you extremely careful when implementing changes.
I learned this the hard way when I did not enable some interfaces properly in a new bundle. I caused the corporate building of my first job, in a bank, to stop working due to a broadcast storm.
Consumer behavior demands more uptime
Fifteen years ago, when I was a network administrator, it was typical to have hours of downtime very late at night. Online transactions were not that common, and you would not expect a lot of people to navigate the Internet at midnight.
Currently, this is not possible. Your network operations demand uptime for everything, everywhere, all at the same time.
Network services provisioning is now more dynamic, as services move through different paths and devices. Network teams are expected to be responsive, support daily requests to enable new connections and locations, fix issues and enhance the network configuration. At the same time, they must ensure compliance with the latest security standards.
It is like changing the wheel of a car that is running. And, stopping will result in fines and legal issues.
Siloed or Do-it-yourself approaches
It is proven that do-it-yourself solutions are simple to start and hard to sustain.
Most of the approaches to network automation have been led by individuals who need to solve their own immediate needs, leading to a task-based (I solve my problem) approach, instead of a process-based (I redefine and solve a whole process) team or company approach.
There are multiple lessons we learned while deploying network automation with our customers, connecting Red Hat Ansible Automation Platform to provide the resources to create, manage and scale automation at an enterprise level:
- First lesson: Time and knowhow is precious; spend less time building and maintaining your own scripts or siloed playbooks. Invest your time in creating and executing network automation content, knowing your effort will have enterprise support with the Red Hat subscription.
- Second lesson: Your automation strategy cannot depend on a single person. The first step is to remove the content from the engineer’s computer, and make the content available for the team with all the enterprise governance in place.
- Third lesson: Share and scale, build on what you have and aim to build automation communities of practice.
Risk management in network operations with a “Start small, think big” approach for network automation
Fortune 500 companies have invested heavily in modernizing IT operations, including networking, and from our major customers and partners we see how a “Start small, think big” approach succeeds as it mitigates the risk while driving a cultural change among our customers.
All enterprise networks are multi-vendor and multi-domain (a.k.a multi-architectures). You might find a combination of campus routing, switching, WiFi, load-balancers, firewalls and datacenter fabrics, monitoring and observability platforms, and DDI (DNS, DHCP, IPAM) solutions. For this reason, we partnered with network technology leaders to simplify the network automation journey with supported integrations, available with a Red Hat Ansible subscription, to leverage existing investments and networking technologies already in production. The following image depicts our ecosystem only for network automation.
How do you tame complexity, reduce toil and manage risk at the same time? This section will explain how our customers managed a cultural change without disrupting network operations, learning and using Ansible Automation Platform for network automation:
Start with valuable read-only use cases
Renewals and investments happen in cycles, and as a network administrator, you can encounter end users connecting rogue devices at remote locations. Networks are extremely complex, and you have to manage tactical day to day operations while training your network engineers to learn and use automation. You can start automating and fine tuning Ansible Playbooks and roles for high value and read-only scenarios such as:
- Network audit and reporting: Verify all configuration and network state is compliant.
- Compliance: Run compliance checks and pass your internal audit faster. Adapt faster to new rules and company definitions.
- Network checklists: Run common checks, ping, traceroute, interfaces and protocols verifications, validate network state and health automatically at the start and end of day, and before and after maintenance windows.
- Configuration backups: Gather configuration files easily and export them to enterprise storage locations.
Modernize network operations procedures for configuration changes
Once your team has learned how to use Ansible Automation Platform without risk, the network automation engineers will be able to start working on low-risk configuration changes.
You can still rollout in a staggered manner. The use cases might include:
- Configuration hygiene: Standardize the configuration starting with low-risk changes such as syslogs, SNMP, banners. After proven success, move to VLANs, access-lists, policies configuration, keep growing in complexity and scope.
- Integrate with sources of truth: Network sources of truth for resources might be Netbox or Nautobot, configuration sources of truth might be Git. IP management solutions (IPAM), DNS, DHCP, such as Infoblox can also be integrated to streamline services provisioning.
- Configuration restore: Obtain configuration files from the storage location, and deploy them to the network devices.
- Network OS upgrades and patching: Standardize the full workflow including pre-checks, and post-checks. You can reuse the network checklists created as read-only, and add extra validations such as hash verifications after distributing the OS images binaries.
- Automate network devices provisioning, retirement, and critical migration scenarios: These use cases are even simpler to automate when you have SDN (Software-Defined Network) controllers such as Cisco Meraki, Arista CloudVision, Juniper Apstra, or Cisco ACI. You can include device onboarding and general settings as part of the automation workflow.
For migrations, use the common language provided by Ansible to extract configuration from existing devices, and convert it to the new OS versions you need to deploy.
Redefine Standard Operational Procedures
These can include simplifying tasks from maintenance windows, to defining a consistent automated response for network issues and anomalies.
- Simplifying maintenance windows: Network changes might be defined by engineering, deployed by operations, and monitored by the NOC. There is a lot of manual coordination, including opening tickets, notifying, running checklists and waiting for emails. The implementation might imply a few hours or minutes, but most of the maintenance window includes waiting time, including approvals and validations. Most if not all of these activities can be automated. You can streamline ticket management, checklists and configuration changes, including backups and rollbacks with Red Hat Ansible Automation Platform and the supported integrations it has with ITSM and multi vendor network solutions.
- Respond faster and consistently to anomalies, even before failures: Event-Driven Ansible allows you to integrate with Network Observability and Monitoring solutions. External solutions can detect and send information about anomalies, and Event-Driven Ansible can trigger an automated response based on predefined rules, which can include opening a ticket into an ITSM such as ServiceNow, gather information or run commands on the affected devices, and update tickets with the relevant information automatically. You can even define common responses to issues and automate your first level of support.
- Automate day to day operations: One of the major challenges is that network specialists are inundated with day to day and low-value activities. You can automate common daily activities such as firewalls policies configuration, port-enablement, and the creation of new enterprise services such as VPNs using Ansible Automation Platform, keeping the required governance and controls between IT teams and network teams.
One of our priorities is to make network automation easier to consume and help our customers get started. We are developing these common use cases for the automation journey with our technology partners, released as Ansible validated content, which will be available with the Red Hat subscription in Ansible automation hub.
In conclusion
Traditionally network operations depended on superheroes: highly skilled employees that carry the know-how of production operations in their brains. Whenever there's a failure, they will be the point of contact, even when they are on vacation. They cannot shift roles because the operation depends on them.
What they know cannot be shared easily without the right tooling and procedures.
Network automation is a disruptor. It can foster a collaborative culture, reduce the burden of network operations and optimize processes.
What are the top 3 benefits for companies? They can:
- Reduce the number of unplanned downtimes
- Reduce time to solve unplanned downtimes (MTTR)
- Maintain compliance with auditors and regulators by upgrading and patching network devices and deploying configuration policies faster.
What are the top 3 benefits for network engineers? They can:
- Become an automation champion in the organization
- Shift roles and show value automating common day to day tasks
- Allocate time and use knowledge for network optimizations and engineering
Why Red Hat Ansible Automation Platform? It can:
- Provide what is needed for enterprise automation.
- Help your organization invest time and effort where it is needed, with the confidence that your investments will have Red Hat’s support.
- Connect you with an ecosystem, which allows you to integrate with network technology leaders and extend your automation efforts to other domains such as cloud or edge computing in the future.
Learn more about network automation!
Visit Network automation with Red Hat Ansible Automation Platform.
We will be at the Red Hat Summit in Denver! Join us to have a conversation about how to get started or continue your network automation journey. Network automation sessions are available in the catalog, including a session I will be delivering with Cisco Systems: “Run your network more efficiently with Red Hat and Cisco”.
Want to try hands-on labs? Check the network and edge self-paced labs we have available.
关于作者
Dafné is a Principal Technical Marketing Manager for Ansible Automation Platform Business Unit at Red Hat. Prior to Red Hat, Dafné worked at Cisco Systems, supporting Service Providers customers across US, Canada and Latin America to design, implement and adopt automation, orchestration and network management solutions. She brings over 15 years of experience in the IT field with specializations in DevNet, ITIL, SCRUM, consulting, networking and automation.
产品
工具
试用购买与出售
沟通
关于红帽
我们是世界领先的企业开源解决方案供应商,提供包括 Linux、云、容器和 Kubernetes。我们致力于提供经过安全强化的解决方案,从核心数据中心到网络边缘,让企业能够更轻松地跨平台和环境运营。