如何写一个bash脚本来重启一个进程,如果它死了? [英] How do I write a bash script to restart a process if it dies?

查看:206
本文介绍了如何写一个bash脚本来重启一个进程,如果它死了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个python脚本,将检查队列并对每个项目执行操作:

 #checkqueue。 py 
while True:
check_queue()
do_something()

我如何编写一个bash脚本,它将检查它是否正在运行,如果没有,启动它。大致下面的伪代码(或者也许应该做一些 ps | grep ?):

 #keepalivescript.sh 
如果processidfile存在:
如果processid正在运行:
exit,all ok

run checkqueue.py
将processid写入processidfile

我将从crontab调用:

 #crontab 
* / 5 * * * * /path/to/keepalivescript.sh


解决方案

避免使用PID文件,crons或其他任何试图评估不是他们的孩子的进程。 / p>

有一个很好的理由,在UNIX中,你只能等待你的孩子。任何试图解决问题的方法(ps解析,pgrep,存储PID,...)都是有缺陷的,并且有空格。只需说出



相反,您需要监视进程是进程的父进程。这是什么意思?这意味着只有启动过程的进程可以可靠地等待它结束。在bash中,这是绝对微不足道的。

  until myserver; do 
echoServer'myserver'用退出代码$?崩溃.Resonved ..>& 2
sleep 1
done
/ pre>

上述一段bash代码在中运行 myserver ,直到 loop。第一行开始 myserver 并等待它结束。当它结束时,直到检查其退出状态。如果退出状态是 0 ,这意味着它正常结束(这意味着你要求它以某种方式关闭,并且它成功)。在这种情况下,我们不想重新启动它(我们只是要求它关闭!)。如果退出状态是不是 0 直到将运行循环体, STDERR上的错误消息,并在1秒后重新启动循环(回到第1行)



为什么我们等待一秒?因为如果 myserver 的启动顺序出错,并且它立即崩溃,你会有一个非常紧张的循环不断重新启动和崩溃在你的手上。 sleep 1 可以消除这种情况。



现在你所需要做的就是启动这个bash脚本异步,可能),它将监视 myserver 并根据需要重新启动它。如果要在启动时启动监视器(使服务器生存重新启动),您可以使用 @reboot 规则在用户的cron(1)中计划它。使用 crontab 打开cron规则:

  crontab -e 

然后添加规则以启动您的监控脚本:

  @reboot / usr / local / bin / myservermonitor 






或者;看看inittab(5)和/ etc / inittab。您可以在其中添加一行,以使 myserver 在某个初始级别启动,并自动重新启动。






编辑。



让我添加一些信息,说明为什么使用PID文件。虽然他们很受欢迎;



考虑这样:











$ b b


  1. PID回收(杀死错误的程序):




    • /etc/init.d/foo start :start foo ,写 foo s PID to /var/run/foo.pid

    • 稍后: foo c> bar )随机PID,假设任何一个随机过程它采用 foo 的旧PID。

    • 您注意到 foo 去: /etc/init.d/foo/restart 读取 /var/run/foo.pid ,检查看看它是否还活着,发现 bar ,认为它是 foo ,杀了它,开始一个新的 foo


  2. PID档案过时。你需要过度复杂(或者我应该说,不平凡的)逻辑来检查PID文件是否过时,任何这样的逻辑再次容易受到 1。。 / p>


  3. 如果您甚至没有写入权限或处于只读环境,该怎么办?


  4. 这是毫无意义的过度复杂;看看我的例子是多么简单。


另请参阅:



顺便说一下, 甚至比PID档案解析 ps 不要这样做。


  1. ps 是非常不便宜的。虽然你几乎在每个UNIX系统上找到它;如果你想要非标准输出,它的参数差别很大。标准输出仅供人类使用,不适用于脚本解析!

  2. 解析 ps 会导致大量误报。取 ps aux | grep PID 示例,现在想象一个人开始一个进程的数字在某个地方作为参数,恰好与PID相同,你盯着你的守护进程!想象两个人开始X会话,你为X杀了你的。

如果你不想自己管理这个过程,有一些完美的系统,将作为您的过程的监视器。例如,查看 runit


I have a python script that'll be checking a queue and performing an action on each item:

# checkqueue.py
while True:
  check_queue()
  do_something()

How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):

# keepalivescript.sh
if processidfile exists:
  if processid is running:
     exit, all ok

run checkqueue.py
write processid to processidfile

I'll call that from a crontab:

# crontab
*/5 * * * * /path/to/keepalivescript.sh

解决方案

Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.

There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.

Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.

until myserver; do
    echo "Server 'myserver' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.

Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.

Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an @reboot rule. Open your cron rules with crontab:

crontab -e

Then add a rule to start your monitor script:

@reboot /usr/local/bin/myservermonitor


Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.


Edit.

Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.

Consider this:

  1. PID recycling (killing the wrong process):

    • /etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
    • A while later: foo dies somehow.
    • A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
    • You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
  2. PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..

  3. What if you don't even have write access or are in a read-only environment?

  4. It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.

See also: Are PID-files still flawed when doing it 'right'?

By the way; even worse than PID files is parsing ps! Don't ever do this.

  1. ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
  2. Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.

If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.

这篇关于如何写一个bash脚本来重启一个进程,如果它死了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆