What are zombie processes?
Zombie processes are those processes that have finished their task, but the parent process (most likely) has died or crashed unexpectedly. They can also be indicative of buggy code. Most users don’t have a good understanding of what zombie processes are and how they affect a Linux host. First and foremost, a low number (say ten, for example) of zombie processes do not contribute to system load. In fact, thousands of zombies wouldn’t contribute to system load.
By definition, zombie processes do not consume resources, for the most part. Each zombie process is still allocated a process ID number, or PID. On 32-bit systems, the max number of PIDs available is 32767. For 64-bit systems, that number increases exponentially to over 4 million.
There is also a small amount of memory used for each zombie process. Technically, it would take tens of thousands, maybe hundreds of thousands of zombie processes to cause significant system resource exhaustion.
There are 2 scenarios I’ll discuss here. The first is a process spawning numerous child PIDs, and then the parent PID crashes or dies without reaping the child PIDs. Normally, this problem would indicate a possible bug with the program or code. When this happens, PID 1 (or the init process) takes ownership of them. For the purpose of this how-to, the quickest path to get rid of these zombies is to reboot. It is also possible to create a dummy process and pass ownership of those zombie processes back to the dummy PID to clean up. That is out of scope here. Reboot and be done with it!
The second scenario is when a process is hung to a point that the OS sees it as running, but the process isn’t actually doing anything. The example I’m using for this discussion is the automatic bug reporting tool daemon (abrtd
) that ships with Red Hat Enterprise Linux and some other distributions. This is a great tool, and I like having it up and running on a system because it gives me, as a sysadmin, a better view into what’s oops’ing—or crashing—but not invoking the crash kernel.
In my environment, this daemon is also a bit of an Achilles heel. By default, abrtd
only creates information for signed applications. Any app can be signed to generate a bug, but if an unsigned app triggers abrtd
, the daemon goes through the motions of creating a bug report and then removes everything it created. This behavior can cause issues with an action hanging for an unsigned app.
Let’s take a look at an example. A user submits an incident report saying that the system was slow due to six zombie processes. There are some dead giveaways on a system that abrtd
is having issues. When you su -
to root you’ll see the following:
'abrt-cli status' timed out
If we check on the abrtd
process we can see that it’s still running, but there’s a child process that’s been running since 5/30.
[root@$HOSTNAME ~]# systemctl status abrtd
● abrtd.service - ABRT Automated Bug Reporting Tool
Loaded: loaded (/usr/lib/systemd/system/abrtd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2019-04-27 08:36:47 EDT; 2 months 13 days ago
Main PID: 1161 (abrtd)
Tasks: 12
Memory: 34.0M
CGroup: /system.slice/abrtd.service
├─ 1161 /usr/sbin/abrtd -d -s
├─60777 abrt-server -s
├─60867 /usr/libexec/abrt-handle-event -i -e post-create -- /var/spool/abrt/unsigned-app-2019-05-30-01:48:24-57451
├─64714 abrt-server -s
├─68157 abrt-server -s
├─70725 abrt-server -s
├─74101 abrt-server -s
├─77136 abrt-server -s
├─81417 abrt-server -s
├─84637 abrt-server -s
├─88183 abrt-server -s
└─90022 abrt-server -s
So abrtd
is still technically running, but that post-create process has created a state where some new ABRT crash reports tried to run, but turned zombie. At this point, you can restart the abrtd
service, and that action will clear all of the zombie processes.
But, if you didn’t know that was the case, here’s how you track down what PID is the zombies’ parent using the ps -xal
command. This command outputs a lot of info, so I’m just going to show the columns we need:
[root@$HOSTNAME ~]# ps -xal | awk '{ print $4 " " $10 " " $13 }' | sort -n
1739 Ssl+ java
1903 S bin/rscd
1903 S bin/rscd
2391 Ssl+ node
2816 Ssl+ java
2889 Ssl+ java
3785 Ss appcollect
3785 Ss appconfigcollect
3926 Ssl+ java
4696 Ss /bin/sh
4731 S bin/bash
4827 Sl /myappbinaries/jre/bin/java
7074 Ss+ httpd
7095 S+ httpd
The fourth column is the parent's PID, the tenth column is the child process status (obviously you’d be looking for PIDs in a Z state), and the thirteenth column is the child process. Using the parent PID in column four, you can now go kill that parent process, and its zombie children will also go away. Unless that parent PID is 1, in which case a reboot will be necessary.
Being in operations, we don’t always have the luxury of rebooting at any time. Personally, I feel like rebooting should be the last resort. Reboots hide a multitude of sins, and leave those sins to present themselves at the most inopportune time, usually late at night or holiday weekends!
Oh, and the load on this system didn’t drop one bit after restarting abrtd
and clearing those 6 zombies.
Sull'autore
Jake has been an Enterprise Linux Systems Administrator for the 13 years. His goal has been to automate himself out of a job and try as he will he's been unable to do so, yet. In his free time, he enjoys fishing, kayaking, and knife making.
Altri risultati simili a questo
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit