这样做时,“右”是PID-文件仍然存在缺陷? [英] Are PID-files still flawed when doing it 'right'?

查看:173
本文介绍了这样做时,“右”是PID-文件仍然存在缺陷?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即 -

重新启动服务通常是通过一个PID文件中实现进程ID被写入某些文件,并基于该号码停止命令将杀死进程(或重新启动之前)。

当你想想看(或者,如果你不喜欢这样,然后搜索),你会发现,这是有问题的,因为每个PID的可重复使用。想象一下,一个完整的服务器重启,你所说的'./your-script.sh启动启动时(例如@reboot在crontab中)。现在your-script.sh会杀了一个任意 PID,因为它已经从存储在PID现场的之前重新启动。

一个解决方法我能想象是存储额外的信息,这样你就可以做'PS -pid | grep'可以且仅当它返回你杀它的东西。或者是在可靠性和/或简单来讲有更好的选择?

 #!/斌/庆典函数的start(){
  nohub Java的罐子somejar.jar>> file.log 2 - ;&放大器; 1和;
  PID = $!
  #甚至可以存放PS - $ PID信息,但是这使得
  #杀太具体如如果一些参数将被添加或相似
  回声$ PID somejar.jar> $ PID_FILE
}功能停止(){
  如果[[-f$ PID_FILE]];然后
    PID = $(切-f1 -d''$ PID_FILE)
    #现在得到的第二信息和grep这一进程列表
    PID_INFO = $(切-f2 -d''$ PID_FILE)
    RES = $(PS - $ PID | grep的$ PID_INFO)
    如果[[X $ RES=X]!]然后
       杀$ PID
    科幻
  科幻
}


解决方案

与PID文件的问题是多方面的,不仅仅局限于回收和重新启动。

更大的问题是,有在PID文件中的信息和处理的状态之间的不可避免的断开/种族

这是采用PID文件的流程:


  1. 您餐桌和放大器; exec的一个过程。 父进程知道叉的PID和有保证本PID为他的叉子专门保留。

  2. 您的父母写PID叉的一个文件。

  3. 您的父母死亡,与它有关的PI​​D排他性的保证一起。

  4. A 不同的的过程中读取PID文件的数量。

  5. 的不同过程检查是否存在具有相同PID的,因为他阅读一个系统上的处理。

  6. 的不同进程发送的信号与他读PID的进程。

在(1)一切都很好,很正常。我们有一个PID,我们通过这个数字是留给我们的预期过程中的核心保障。

在PID(2)你的收益率控制到不具备这样的保证其他进程。本身不是一个问题,但这种行为是很少,如果没有过错。

在(3)你的父进程死亡。它独对PID排他性内核保证。它可能会或可能不会做的PID等待(2)。预期的过程的真实状态丢失,所有我们已经离开是在其中可以或可以不指预期进程PID文件的标识符

在(4)无任何保证一个进程读取PID文件,任何使用这个号码的只有任意成功。

在(5)没有任何保证的过程实际使用的东西的标识,这就是我们实际上做坏事的第一点:我们使用,可能会或可能不是指一个进程标识符查询内核预期的过程。我们会回来的答案将在过程中的状态与PID,不一定是我们预期的过程中都没有。

在(6)我们做的最糟糕的错误:我们实际上正在执行一个动作变异,意在影响我们最初启动的进程,但绝不是低保意图。我们可以将信令任何随机系统的过程吧。

这是为什么?可以发生在乱什么样的东西与PID?

(1)之后的任意位置,实际过程可能会死。只要父母保留他对PID的排他性保证,内核将不回收的PID。它会仍然存在,并指曾经被认为是你的进程(我们称之为僵尸的过程,你的实际进程死亡,但仍PID单独保留它)。没有其他进程可以使用这个PID和信令将无法到达的任何进程。

一旦父释放他的担保或(3),内核回收死亡过程的PID后。僵尸消失,且PID现在可以自由地通过分叉任何其他新的过程中使用。假设你正在编译的东西,成千上万的小工艺得到催生。内核选取每个随机或(取决于其配置)顺序新的PID。大功告成,现在你重新启动Apache。内核重用你死去进程的PID中解脱出来的东西很重要。

该PID文件仍然包含PID,但。任何进程,读取PID文件(4)假设这个数字指的是你早就死的过程。

任何行动(5)(6)你用你读将瞄准新的进程数量,而不是旧的。

不仅如此,但你不能之前,你的行动,因为有一个不可避免的种族之间的任何检查就可以执行任何操作,您可以执行执行任何检查。如果你先看看 PS 来看看有什么你的流程的名称(不,这是任何东西真的真棒保障,请不要这样做)然后信号它,你的 PS 检查和你的信号之间的时间仍然可以看到该进程死掉,和/或通过获得一个新的进程回收。所有这些问题的根源是,内核不给你任何独家使用担保的PID,因为你不是它的父。

这个故事告诉我们:不要给你的孩子的PID给任何人。家长只有家长应该使用它,因为他是与它的存在和身份任何保证系统上的唯一一个(保存的内核)。

这通常意味着保持父活着,而不是一些信号终止进程,聊到父母,而不是和;由插座或类似的装置。请参见 http://smarden.org/runit/ 等。

Restarting a service is often implemented via a PID file - I.e. the process ID is written to some file and based on that number the stop command will kill the process (or before a restart).

When you think about it (or if you don't like this, then search) you'll find that this is problematic as every PID could be reused. Imagine a complete server restart where you call './your-script.sh start' at startup (e.g. @reboot in crontab). Now your-script.sh will kill an arbitrary PID because it has stored the PID from the live before the restart.

One workaround I can imagine is to store an additional information, so that you could do 'ps -pid | grep ' and only if this returns something you kill it. Or are there better options in terms of reliability and/or simplicity?

#!/bin/bash

function start() {
  nohub java -jar somejar.jar >> file.log 2>&1 &
  PID=$!
  # one could even store the "ps -$PID" information but this makes the
  # killing too specific e.g. if some arguments will be added or similar
  echo "$PID somejar.jar" > $PID_FILE
}

function stop() {
  if [[ -f "$PID_FILE" ]]; then
    PID=$(cut -f1 -d' ' $PID_FILE)
    # now get the second information and grep the process list with this
    PID_INFO=$(cut -f2 -d' ' $PID_FILE)
    RES=$(ps -$PID | grep $PID_INFO)
    if [[ "x$RES" != "x" ]]; then
       kill $PID
    fi
  fi
}

解决方案

The problem with PID files is multifold, not just limited to recycling and reboot.

The bigger issue is the fact that there is an unavoidable disconnect/race between the information in the PID file and the state of the process.

This is the flow of using PID files:

  1. You fork & exec a process. The "parent" process knows the PID of the fork and has guarantees that this PID is reserved exclusively for his fork.
  2. Your parent writes the PID of the fork to a file.
  3. Your parent dies, along with it the guarantee about PID exclusivity.
  4. A different process reads the number in the PID file.
  5. The different process checks whether there is a process on the system with the same PID as the one he read.
  6. The different process sends a signal to the process with the PID he read.

In (1) everything is fine and dandy. We have a PID and we are guaranteed by the kernel that the number is reserved for our intended process.

In (2) you are yielding control of the PID to other processes that do not have this guarantee. In itself not an issue, but such an act is rarely if ever without fault.

In (3) your parent process dies. It alone had the kernel guarantee on PID exclusivity. It may or may not have done a wait(2) on the PID. The true status of the intended process is lost, all we have left is an identifier in the PID file which may or may not refer to the intended process.

In (4) a process without any guarantees reads the PID file, any use of this number has only arbitrary success.

In (5) a process without any guarantees actually uses the identifier for something, this is the first point where we're actually doing something bad: we're querying the kernel using a process identifier that may or may not refer to the intended process. The answer we'll get back will be on the state of the process with that PID, not necessarily of our intended process at all.

In (6) we make the worst mistake: we're actually performing a mutating action, intended to impact our initially started process but by no means guaranteeing that intent. We could be signalling any random system process instead.

Why is this? What kind of stuff can happen to mess with the PID?

Anywhere after (1), the real process may die. So long as the parent retains his guarantee on the PID's exclusivity, the kernel will not recycle the PID. It will still exist and refer to what used to be your process (we call this a "zombie" process, your real process died but the PID is still reserved for it alone). No other process can use this PID and signalling it will not reach any process at all.

As soon as the parent releases his guarantee or after (3), the kernel recycles the PID of the dead process. The zombie is gone and the PID now free to be used by any other new process that is forked. Say you're compiling something, thousands of small processes get spawned. The kernel picks random or sequential (depending on its configuration) new PIDs for each. You're done, now you restart apache. The kernel reuses the freed PID of your dead process for something important.

The PID file still contains the PID, though. Any process that reads the PID file (4) is assuming that this number refers to your long dead process.

Any action (5) (6) you take with the number you read will target the new process, not the old one.

Not only that, but you cannot perform any check prior to your action since there is an unavoidable race between any check you can perform and any action you can perform. If you first look at ps to see what the "name" of your process is (not that this is a really awesome guarantee of anything, please don't do this), and then signal it, the time between your ps check and your signal could still have seen the process die, and/or get recycled by a new process. The root of all of these problems is that the kernel is not giving you any exclusive use guarantees on the PID, since you are not its parent.

Moral of the story: Do NOT give the PID of your children to anyone else. The parent and only the parent should use it, because he is the only one on the system (save the kernel) with any guarantees on its existence and identity.

This usually means keeping the parent alive and instead of signalling something to terminate the process, talking to the parent instead; by means of sockets or the like. See http://smarden.org/runit/ et al.

这篇关于这样做时,“右”是PID-文件仍然存在缺陷?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆