如果进程终止,如何编写 bash 脚本以重新启动进程? [英] How do I write a bash script to restart a process if it dies?

查看:25
本文介绍了如果进程终止,如何编写 bash 脚本以重新启动进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 python 脚本,它将检查队列并对每个项目执行操作:

# checkqueue.py而真:check_queue()做点什么()

如何编写一个 bash 脚本来检查它是否正在运行,如果没有,则启动它.大致如下伪代码(或者它应该做类似 ps | grep 的事情?):

# keepalivescript.sh如果 processidfile 存在:如果 processid 正在运行:退出,一切正常运行 checkqueue.py将 processid 写入 processidfile

我会从 crontab 调用它:

# crontab*/5 * * * */path/to/keepalivescript.sh

解决方案

避免使用 PID 文件、cron 或其他任何试图评估不是其子进程的进程.

在 UNIX 中,您只能等待您的孩子,这是有充分理由的.任何试图解决该问题的方法(ps 解析、pgrep、存储 PID 等)都是有缺陷的,并且存在漏洞.直接说.

相反,您需要将监控您的进程的进程作为进程的父进程.这是什么意思?这意味着只有启动你的进程可以可靠地等待它结束.在 bash 中,这绝对是微不足道的.

直到我的服务器;做回声服务器'myserver'崩溃,退出代码$?.正在重生..">&2睡觉 1完毕

上面的 bash 代码在 until 循环中运行 myserver.第一行开始 myserver 并等待它结束.当它结束时,until 检查它的退出状态.如果退出状态为 0,则表示它正常结束(这意味着您要求它以某种方式关闭,并且它成功关闭).在这种情况下,我们不想重新启动它(我们只是要求它关闭!).如果退出状态为not 0until 将运行循环体,它会在 STDERR 上发出错误消息并重新启动循环(回到第 1) 行 1 秒后.

我们为什么要等一会儿?因为如果 myserver 的启动顺序有问题并且它立即崩溃,你将有一个非常密集的循环,不断重启和崩溃.sleep 1 消除了压力.

现在您需要做的就是启动这个 bash 脚本(可能是异步的),它会监视 myserver 并根据需要重新启动它.如果您想在启动时启动监视器(使服务器幸存"重新启动),您可以使用 @reboot 规则在用户的 cron(1) 中安排它.使用 crontab 打开你的 cron 规则:

crontab -e

然后添加一个规则来启动你的监控脚本:

@reboot/usr/local/bin/myservermonitor

<小时>

或者;查看 inittab(5) 和/etc/inittab.您可以在其中添加一行让 myserver 在某个初始化级别开始并自动重新生成.

<小时>

编辑.

让我补充一些关于为什么使用 PID 文件的信息.虽然它们很受欢迎;它们也有很大的缺陷,你没有理由不以正确的方式去做.

考虑一下:

  1. PID 回收(杀死错误的进程):

    • /etc/init.d/foo start:启动foo,将foo的PID写入/var/run/foo.pid
    • 过了一会儿:foo 不知何故死了.
    • 稍后:任何开始的随机进程(称之为 bar)都采用随机 PID,想象它采用 foo 的旧 PID.
    • 你注意到 foo 不见了:/etc/init.d/foo/restart 读取了 /var/run/foo.pid,检查它是否还活着,找到bar,认为它是foo,杀死它,开始一个新的foo.
  2. PID 文件变得陈旧.您需要过于复杂(或者我应该说,非平凡)的逻辑来检查 PID 文件是否过时,并且任何此类逻辑再次容易受到 1. 的攻击.<​​/p>

  3. 如果您甚至没有写入权限或处于只读环境中怎么办?

  4. 这是毫无意义的过度复杂化;看看我上面的例子有多简单.完全没有必要把事情复杂化.

另见:是PID文件做正确"时仍然有缺陷?

顺便说一下;比 PID 文件更糟糕的是解析 ps永远不要这样做.

  1. ps 非常不可移植.虽然您几乎可以在每个 UNIX 系统上找到它;如果你想要非标准输出,它的参数会有很大差异.并且标准输出仅供人类使用,不适用于脚本解析!
  2. 解析 ps 会导致很多误报.以 ps aux |grep PID 示例,现在想象有人用某个数字作为参数启动一个进程,该数字恰好与您监视守护程序的 PID 相同!想象一下,两个人开始了一个 X 会话,而你正在寻找 X 来杀死你的会话.这只是各种坏事.

如果您不想自己管理流程;有一些非常好的系统可以充当您流程的监视器.例如,查看 runit.

I have a python script that'll be checking a queue and performing an action on each item:

# checkqueue.py
while True:
  check_queue()
  do_something()

How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):

# keepalivescript.sh
if processidfile exists:
  if processid is running:
     exit, all ok

run checkqueue.py
write processid to processidfile

I'll call that from a crontab:

# crontab
*/5 * * * * /path/to/keepalivescript.sh

解决方案

Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.

There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.

Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.

until myserver; do
    echo "Server 'myserver' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.

Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.

Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an @reboot rule. Open your cron rules with crontab:

crontab -e

Then add a rule to start your monitor script:

@reboot /usr/local/bin/myservermonitor


Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.


Edit.

Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.

Consider this:

  1. PID recycling (killing the wrong process):

    • /etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
    • A while later: foo dies somehow.
    • A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
    • You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
  2. PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..

  3. What if you don't even have write access or are in a read-only environment?

  4. It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.

See also: Are PID-files still flawed when doing it 'right'?

By the way; even worse than PID files is parsing ps! Don't ever do this.

  1. ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
  2. Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.

If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.

这篇关于如果进程终止,如何编写 bash 脚本以重新启动进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆