On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution (via the exit system call) but still has an entry in the process table: it is a process in the “Terminated state”.
The term zombie process derives from the common definition of zombie - an undead person. In the term’s metaphor, the child process has “died” but has not yet been “reaped”. Also, unlike normal processes, the kill command has no effect on a zombie process.
Source: wikipedia.org, 2018-04-12
After the zombie is removed, its process identifier (PID) and entry in the process table can then be reused. However, if a parent fails to call wait, the zombie will be left in the process table, causing a resource leak.
As with other resource leaks, the presence of a few zombies is not worrisome in itself, but may indicate a problem that would grow serious under heavier loads. Since there is no memory allocated to zombie processes – the only system memory usage is for the process table entry itself – the primary concern with many zombies is not running out of memory, but rather running out of process table entries, concretely process ID numbers.
Source: wikipedia.org, 2018-04-12
The number of processes that an individual can run can be checked with
Check user process limit
Example check all limits
ulimit -u tan@omega:~$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 31775 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 31775 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Example check user process limit
tan@omega:~$ ulimit -u 31775
Detect Zombie Processes
How can you detect Zombies? Zombies can be identified in the output from the Unix ps command by the presence of a “Z” in the “STAT” column.
vinh@omega:/etc/opt/six/fo/monit> ps -el | grep 'Z' F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 Z 0 363501 2617 0 80 0 - 0 exit ? 00:00:00 sh <defunct> 4 Z 0 431579 130477 3 80 0 - 0 exit ? 00:00:00 docker-current <defunct>
You can also use
top -H will show the amount of threads, instead of processes.
top - 13:13:29 up 167 days, 21:45, 5 users, load average: 3.98, 4.28, 4.05 Threads: 40110 total, 4 running, 40104 sleeping, 0 stopped, 2 zombie %Cpu(s): 3.7 us, 1.6 sy, 0.1 ni, 94.4 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 13112041+total, 8070096 free, 64865496 used, 58184820 buff/cache KiB Swap: 4194300 total, 4139660 free, 54640 used. 65108580 avail Mem
The amount of zombies should be monitored. On another server:
top - 13:39:28 up 183 days, 2:56, 2 users, load average: 3.22, 2.80, 2.45 Tasks: 2023 total, 2 running, 1622 sleeping, 0 stopped, 399 zombie %Cpu(s): 4.1 us, 1.2 sy, 0.1 ni, 94.3 id, 0.2 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 13112041+total, 3482208 free, 74123296 used, 53514912 buff/cache KiB Swap: 4194300 total, 3827404 free, 366896 used. 56010040 avail Mem
To check all your servers with Ansible:
vinh@alpha:~> /bin/ansible all -m shell -a "ps -el | grep 'Z' | sed 1d | wc -l"
sed 1d command ignores the line of output. Since it is a header, we do not count them.
The quotes are from an article by Benjamin Cane.
If the parent process of the zombie or zombies is still active (not process id 1) than this is an indication that the parent process is stalled on a certain task and has not yet read the exit status of the child processes. At this point the resolution is extremely situational, you can use the strace command to attach to the parent process and troubleshoot from there.
You may also be able to make the parent process exit cleanly taking its zombie children by gracefully stop or restart the process.
Example check parent process with
vinh@omega:/etc/opt/six/fo/monit> ps -el | grep 'Z' F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 Z 0 363501 2617 0 80 0 - 0 exit ? 00:00:00 sh <defunct> 4 Z 0 431579 130477 3 80 0 - 0 exit ? 00:00:00 docker-current <defunct> vinh@omega:/etc/opt/six/fo/monit> pstree -p -s 363501 systemd(1)───salt-minion(2481)───salt-minion(2617)───sh(363501)
If the parent process is no longer active than the clean up activity becomes a choice; at this point you can leave the zombie processes on your system, or you can simply reboot. A Zombie process whose parent is no longer active is not going to be cleaned up without rebooting the system. If the zombie processes are only in small numbers and not reoccurring or multiplying than it may be best to leave these processes be until the next reboot. If however they are multiplying or in a large number than this is an indication that there is a significant issue with your system.