为什么cron产生的进程最终消失了? [英] Why do processes spawned by cron end up defunct?
问题描述
我有一些进程在 top
(和)中显示为
)。我把事情从真实的脚本和程序中简化了。< defunct>
ps
I have some processes showing up as <defunct>
in top
(and ps
). I've boiled things down from the real scripts and programs.
在我的 crontab
:
* * * * * /tmp/launcher.sh /tmp/tester.sh
launcher.sh
(当然标记为可执行)的内容:
The contents of launcher.sh
(which is of course marked executable):
#!/bin/bash
# the real script does a little argument processing here
"$@"
tester.sh
的内容(当然标记为可执行) ):
The contents of tester.sh
(which is of course marked executable):
#!/bin/bash
sleep 27 & # the real script launches a compiled C program in the background
ps启动一个C程序
显示以下内容:
user 24257 24256 0 18:32 ? 00:00:00 [launcher.sh] <defunct>
user 24259 1 0 18:32 ? 00:00:00 sleep 27
请注意, tester.sh
不出现-启动后台作业后退出。
Note that tester.sh
does not appear--it has exited after launching the background job.
为什么 launcher.sh
坚持,标记为< defunct>
?似乎只有在 cron
启动时才这样做-而不是在我自己运行时。
Why does launcher.sh
stick around, marked <defunct>
? It only seems to do this when launched by cron
--not when I run it myself.
附加说明: launcher.sh
是运行该系统的常见脚本,不容易修改。其他事情( crontab
, tester.sh
,甚至是我运行的程序而不是睡眠
)可以更容易地修改。
Additional note: launcher.sh
is a common script in the system this runs on, which is not easily modified. The other things (crontab
, tester.sh
, even the program that I run instead of sleep
) can be modiified much more easily.
推荐答案
因为它们不是主题 wait(2)
系统调用。
Because they haven't been the subject of a wait(2)
system call.
由于将来有人可能会等待这些进程,因此内核可以不能完全摆脱它们,否则它将无法执行 wait
系统调用,因为它不再具有退出状态或存在的证据。
Since someone may wait for these processes in the future, the kernel can't completely get rid of them or it won't be able to execute the wait
system call because it won't have the exit status or evidence of its existence any more.
从shell启动时,shell会捕获SIGCHLD并进行各种等待操作,因此长期没有失效。
When you start one from the shell, your shell is trapping SIGCHLD and doing various wait operations anyway, so nothing stays defunct for long.
但是cron并没有处于等待状态,它正在睡觉,所以这个已经去世的孩子可能会呆一会儿,直到cron醒来。
But cron isn't in a wait state, it is sleeping, so the defunct child may stick around for a while until cron wakes up.
更新: 回应评论...
嗯。我确实设法解决了这个问题:
Update: Responding to comment... Hmm. I did manage to duplicate the issue:
PPID PID PGID SESS COMMAND
1 3562 3562 3562 cron
3562 1629 3562 3562 \_ cron
1629 1636 1636 1636 \_ sh <defunct>
1 1639 1636 1636 sleep
所以,发生了什么事,我认为:
So, what happened was, I think:
- cron fork和cron child启动shell
- shell(1636)启动sid和pgid 1636并开始睡眠
- 外壳退出,SIGCHLD发送到cron 3562
- 信号被忽略或处理不正确
- shell变成了僵尸。请注意,睡眠是与init关联的,因此当睡眠退出init时,它将获得信号并进行清理。我仍在尝试找出僵尸何时收割。可能没有活跃的孩子,cron 1629认为它可以退出,到那时僵尸将被重新初始化并获得收割。因此,现在我们想知道cron应该处理的缺少SIGCHLD。
- 不一定是vixie cron的错。如您所见, libdaemon在
daemon_fork()
期间安装了SIGCHLD处理程序,这可能会干扰中间点1629快速退出时的信号传递。现在,我什至不知道我的Ubuntu系统上的vixie cron是否甚至是用libdaemon构建的,但是至少我有一个新理论。 :-)
- cron forks and cron child starts shell
- shell (1636) starts sid and pgid 1636 and starts sleep
- shell exits, SIGCHLD sent to cron 3562
- signal is ignored or mishandled
- shell turns zombie. Note that sleep is reparented to init, so when the sleep exits init will get the signal and clean up. I'm still trying to figure out when the zombie gets reaped. Probably with no active children cron 1629 figures out it can exit, at that point the zombie will be reparented to init and get reaped. So now we wonder about the missing SIGCHLD that cron should have processed.
- It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during
daemon_fork()
, and this could interfere with signal delivery on a quick exit by intermediate 1629Now, I don't even know if vixie cron on my Ubuntu system is even built with libdaemon, but at least I have a new theory. :-)
这篇关于为什么cron产生的进程最终消失了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- It isn't necessarily vixie cron's fault. As you can see here, libdaemon installs a SIGCHLD handler during
- 不一定是vixie cron的错。如您所见, libdaemon在