我怎样写一个bash脚本,如果它死了重新启动的过程? [英] How do I write a bash script to restart a process if it dies?

查看:119
本文介绍了我怎样写一个bash脚本,如果它死了重新启动的过程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python脚本,会检查队列,在每个项目上执行操作:

 #checkqueue.py
而真正的:
  check_queue()
  做一点事()

我如何写一个bash脚本,如果它的运行,将检查,如果没有,启动它。大致有以下几种伪code(或也许它应该像做 PS | grep的?):

 #keepalivescript.sh
如果processidfile存在:
  如果进程ID正在运行:
     退出,一切ok运行checkqueue.py
写进程ID来processidfile

我会打电话,从一个crontab:

 #的crontab
* / 5 * * * * /path/to/keepalivescript.sh


解决方案

避免PID-文件,crons,或其他任何试图评估不属于自己的孩子进程。

有一个很好的理由,为什么在UNIX中,你只能等待你的孩子。任何方法(ps的解析被指派,存储PID,...),试图解决的是有缺陷的,并张开它孔。只是说

相反,你需要监视你的进程是进程的父进程。这是什么意思?这只是说的进程启动的流程能够可靠地等待它结束。在bash中,这绝对是微不足道的。

 直到MYSERVER;做
    回声服务器'MYSERVER',退出code $坠毁?重生。>和2
    睡眠1
DONE

上面的一块bash的code运行 MYSERVER 直到循环。第一行开始 MYSERVER 并等待它结束。当它结束时,直到检查其退出状态。如果退出状态是 0 ,这意味着它结束正常(这意味着你要求它以某种方式关机,它这样做是成功的)。在这种情况下,我们不希望重新启动它(我们只是要求它关闭!)。如果退出状态的 0 直到将要运行的循环体,其发射在STDERR一个错误信息,并重新启动循环(回1号线)的 1秒后

为什么我们等一下?因为如果有什么地方错了 MYSERVER 并立即崩溃的启动顺序,你必须不断重新启动的非常密集的循环和崩溃的双手。在睡眠1 带走从该菌株。

现在你需要做的是开始这个​​bash脚本(异步,大概),它会监视 MYSERVER 并根据需要重新启动。如果你想在启动时自动监控(使得服务器生存重启),您可以在用户的​​cron(1)与 @reboot 规则安排它。

:以的crontab 打开你的cron规则

  crontab -e命令

然后添加一个规则来启动监控脚本:

  @reboot在/ usr / local / bin目录/ myservermonitor


另外,看看inittab中(5)和/ etc / inittab文件。你可以在那里一定的init级别添加一行有 MYSERVER 启动和自动重新生成。


编辑。

让我补充为什么不会以使用PID文件的一些信息。虽然他们都非常受欢迎;他们也非常不完善,没有理由,你为什么会不只是做了正确的道路。

考虑一下:


  1. PID回收(杀错过程):


    • /etc/init.d/foo启动:启动,写富的PID号 /var/run/foo.pid

    • 过了一会儿:莫名其妙地死去

    • 过了一会儿:启动(称之为)采用随机的PID,想象它采取任何随机过程的老PID。

    • 您注意到的走了 /etc/init.d/foo/restart /var/run/foo.pid ,检查,看它是否还活着,找到,认为它是,杀死它,启动一个新的


  2. PID的文件去陈旧。您需要过于复杂的(或者我应该说,不平凡的),检查逻辑PID文件是否过时,任何这样的逻辑是再容易 1


  3. 如果你甚至不用写访问或处于只读的环境呢?


  4. 这是没有意义的overcomplication;看到我上面的例子是多么简单。无需复杂的是,在所有。


参见:<一href=\"http://stackoverflow.com/questions/25906020/are-pid-files-still-flawed-when-doing-it-right/25933330#25933330\">Are PID-文件做它时,右?还是有缺陷的

顺便说一句; 于PID文件,更糟糕的是解析 PS 永远不要这么做。


  1. PS 是非常不可移植。当你发现它几乎每一个UNIX系统;如果你想不标准输出它的参数相差很大。和标准输出仅是供人食用,而不是照本宣科解析!

  2. 解析 PS 导致误报了很多。就拿的ps aux | grep的PID 例如,现在想象一下有人开始的过程与一些地方的说法,恰好是作为PID你盯着你的守护进程一样!想象一下,两个人开始X会话,你grepping对于x杀了你的。这只是各种恶劣。

如果您不想管理的过程中自己;也有一些非常好的系统,在那里将充当监视器监视您的程序。考虑 runit ,例如。

I have a python script that'll be checking a queue and performing an action on each item:

# checkqueue.py
while True:
  check_queue()
  do_something()

How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):

# keepalivescript.sh
if processidfile exists:
  if processid is running:
     exit, all ok

run checkqueue.py
write processid to processidfile

I'll call that from a crontab:

# crontab
*/5 * * * * /path/to/keepalivescript.sh

解决方案

Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.

There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.

Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.

until myserver; do
    echo "Server 'myserver' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.

Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.

Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an @reboot rule. Open your cron rules with crontab:

crontab -e

Then add a rule to start your monitor script:

@reboot /usr/local/bin/myservermonitor


Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.


Edit.

Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.

Consider this:

  1. PID recycling (killing the wrong process):

    • /etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
    • A while later: foo dies somehow.
    • A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
    • You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
  2. PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..

  3. What if you don't even have write access or are in a read-only environment?

  4. It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.

See also: Are PID-files still flawed when doing it 'right'?

By the way; even worse than PID files is parsing ps! Don't ever do this.

  1. ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
  2. Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.

If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.

这篇关于我怎样写一个bash脚本,如果它死了重新启动的过程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆