根据ssh并行检查特定程序是否仍在运行 [英] checking per ssh if a specific program is still running, in parallel

查看:115
本文介绍了根据ssh并行检查特定程序是否仍在运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几台正在运行程序的机器.我想每30秒左右检查一下这些程序是否仍在运行.我使用以下命令来做到这一点.

I have several machines where I have a program running. Every 30 seconds or so I want to check if those programs are still running. I use the following command to do that.

ssh ${USER}@${HOSTS[i]} "bash -c 'if [[ -z \"\$(pgrep -u ${USER} program)\" ]]; then exit 1; else exit 0; fi'"

现在在100台以上的计算机上运行此命令需要花费很长时间,我想通过并行检查来加快速度.我知道'&'和并行",但我不确定如何获取返回值(任务是否完成).

Now running this on >100 machines takes a long time and I want to speed that up by checking in parallel. I am aware of '&' and 'parallel', but I am unsure how to retreive the return value (task completed or not).

推荐答案

以下内容可让所有连接完成,然后再开始下一批中的任何连接,因此可能等待30秒以上-但应该给您一个好主意如何做您要寻找的东西:

The following lets all connections complete before starting any in the next batch, and thus can potentially wait for more than 30 seconds -- but should give you a good idea of how to do what you're looking for:

hosts=( host1 host2 host3 )
user=someuser
script="script you want to run on each remote host"

last_time=$(( SECONDS - 30 ))
while (( ( SECONDS - last_time ) >= 30 )) || \
      sleep $(( 30 - (SECONDS - last_time) )); do
  last_time=$SECONDS
  declare -A pids=( )
  for host in "${hosts[@]}"; do
    ssh "${user}@${host}" "$script" & pids[$!]="$host"
  done
  for pid in "${!pids[@]}"; do
    wait "$pid" || {
      echo "Failure monitoring host ${pids[$pid]} at time $SECONDS" >&2
    }
  done
done


现在,更大的图景:不要那样.


Now, bigger picture: Don't do that.

几乎每个操作系统都有一个流程监督框架. Ubuntu拥有Upstart; Fedora和CentOS 7已经系统化; MacOS X已启动; runit,daemontools和其他工具可以安装在任何地方(并且非常非常容易使用-查看运行脚本,网址为

Almost every operating system has a process supervision framework. Ubuntu has Upstart; Fedora and CentOS 7 have systemd; MacOS X has launchd; runit, daemontools, and others can be installed anywhere (and are very, very easy to use -- look at the run scripts at http://smarden.org/runit/runscripts.html for examples).

使用这些工具是监视进程并确保在退出时重新启动的正确方法:与这种(非常高开销的)解决方案不同,它们几乎没有开销,因为它们依赖于操作系统来通知进程的父进程退出时,而不是进行进程轮询(并且仅在通过SSH进行连接,协商一对会话密钥,启动Shell来运行脚本等所有开销之后,等等)

Using these tools are the Right Way to monitor a process and ensure that it restarts whenever it exits: Unlike this (very high-overhead) solution they have almost no overhead at all, since they rely on the operating system notifying a process's parent when that process exits, rather than doing the work of polling for a process (and that only after all the overhead of connecting via SSH, negotiating a pair of session keys, starting a shell to run your script, etc, etc, etc).

是的,这可能是一个小型私人项目.尽管如此,您仍在为自己制造额外的复杂性(因此也带来了额外的错误),并且,如果您学会使用工具正确地做到这一点,那么当您拥有不是的东西时,您就会知道如何正确地做事这不是一个小型私人项目.

Yes, this may be a small private project. Still, you're making extra complexity (and thus, extra bugs) for yourself -- and if you learn to use the tools to do this right, you'll know how to do things right when you have something that isn't a small private project.

这篇关于根据ssh并行检查特定程序是否仍在运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆