It may seem that with automation and agility, an Information Technology Infrastructure Library (ITIL) is outdated, but I don't think we've seen the end of this methodology yet. ITIL has served numerous IT organizations as a guideline and blueprint for processes, and it continues to be a significant tool for the IT professional. You can modernize your approach to ITIL with the automation tools provided by Red Hat Ansible Automation Platform and the principles of infrastructure-as-code (IaC).
What is incident and problem management?
In a nutshell, problem management is the proactive sibling of reactive incident management, but what are incident and problem management, exactly?
Incident management: Detecting and handling issues negatively impacting the quality, availability or performance of any service. The handling encompasses restoring the service, generally based on a written script (a.k.a. documentation) followed by a support person. For example, if users can’t access an application, the script will describe the troubleshooting of this application and restarting it if it crashed.
Problem management: This is a kind of follow-up of incident management, and consists of analyzing the root cause of recurring or important incidents, and deriving action plans to fix them so that they don’t appear again. It is one of my favorite ITIL processes because it is about avoiding issues instead of fixing them (who doesn’t want to avoid problems?), and is a basis for continuous improvement! Sadly, it is seldom done properly because once the incident is gone, it is difficult to find the time to do the work to avoid it from happening again.
Continuing with our previous example, we’d first find out why the application crashes regularly (e.g. because it runs out of memory), and fix the underlying root causes (e.g. increase the memory on the server, monitor memory consumption and potentially fix a memory leak in the application).
Incident management and automation
It is relatively straightforward to automate manual steps described in the aforementioned scripts using Ansible Playbooks.
Those playbooks can be made available to your support personnel through the role based access control (RBAC) system of Red Hat Ansible Automation Platform , either directly in the web UI of Ansible Automation Platform, or through an API integration within your ticketing system (or other portal).
NOTE: You have the idea of using Ansible Automation Platform as your monitoring system to detect issues and accordingly create incidents. While this is technically possible, the performance impact for close monitoring would likely be prohibitive, and there are much better solutions. Instead, you should integrate a proper monitoring system with Event-Driven Ansible to trigger automation, as described above.
Once you feel confident enough, you can skip the human step completely and use Event-Driven Ansible to automatically trigger the automation put in place.

In a first approach, you can complete the incident ticket with additional information gathered by Ansible Automation Platform so that you can watch for negative effects while you build confidence in your own automation.
Even if incidents are resolved automatically, it remains important to keep a record so you can analyze them. You want to be aware of the fact that, unseen, Ansible Automation Platform has restarted an application 100 times a day—if that is happening, the application needs to be fixed. This brings us to our next topic.
Problem management and automation
The relationship between problem management and automation might not be that obvious, so let's take a moment to clarify it.
As your environment becomes increasingly automated, any incident you might encounter is potentially due to:
- An error or a glitch in your existing automation
- A manual intervention due to a gap in your automation
Also, as we’ve seen in the previous article of this series, release management encompasses regular testing of your automation in a pipeline.
That means that in addition to searching the root cause of your problem, you’ll have to think about its impact on your automation and extend the corrective actions along the lines of:
- How to fix the automation to avoid the incident happening again
- Which automation to add to avoid the manual mishap in the future
- And, most important, which test case to add to your pipeline to detect the issue before it can happen again in production. A developer would tell you that you’re avoiding regressions, making sure that your automation always improves.
The last point is why I recommend creating a simple test pipeline (known as "smoke tests"), and expanding it step-by-step with test cases that catch errors happening in reality. This avoids having too many theoretical test cases which never catch any issue, because test cases also need to be maintained and require additional effort. Problem management is the perfect place to catch those real test cases.
Wrap up
We’ve seen how to improve and optimize incident management with automation and Event-Driven Ansible, working towards a self-healing environment. We've also talked about how problem management and automation can be combined to support continuous improvement and avoid regressions in your automation content.
Automation can be a long but rewarding journey, and Red Hat Services would be happy to help you introduce automation in your enterprise, with or without ITIL.
Sull'autore
Since 2013 at Red Hat, I'm responsible within Red Hat Consulting EMEA to create Services Solutions encompassing Automation and Edge topics. I'm also Automation Community of Practice Manager, addressing Red Hat automation practitioners around the globe.
You may address me in English, French or German.
Altri risultati simili a questo
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit