GNU并行 - 其作业失败? [英] GNU Parallel - which job failed?

查看:92
本文介绍了GNU并行 - 其作业失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在几个不同服务器(最多25个),使用GNU并行运行的作业。

I'm running a job on several different servers (up to 25) using GNU parallel.

它实现了shell脚本目前这确实

The shell script which implements this currently does:

parallel --tag --nonall -S $some_list_of_servers "some_command"
state=$?
echo -n "RESULT: "
if [ "$state" -eq "0" ]
then
    echo "All jobs successful"
else
    echo "$state jobs failed"
fi
return $state

其中some_list_of_servers是一个数组,并install_command是,例如,混帐取。

where some_list_of_servers is an array, and install_command is, for instance, git fetch.

我要的不仅仅是多少工作没有更多的信息。我想知道哪些命令,哪个服务器,失败。

What I want is LOT more information than just how many jobs failed. I want to know which command, and which server, failed.

我已经通过手册页,而谷歌和SO但无法找到我要找的开关(ES)。

I've been through the man page, and google, and SO but can't find the switch(es) that I'm looking for.

任何帮助感激AP preciated。

Any help gratefully appreciated.

WeeDom

编辑回应答1:

我试过了,和一些奇怪的事情正在发生。

I tried that, and something odd is happening.

weedom@host1: ~/$ parallel --tag --nonall  -j8 --joblog test.log -S host1,host2 uptime 
host2   10:41:17 up 36 days, 20:45,  1 user,  load average: 0.00, 0.00, 0.00
host1         10:41:17 up 22:34,  3 users,  load average: 0.06, 0.11, 0.04
weedom@host1: ~/$ cat test.log
Seq     Host    Starttime       Runtime Send    Receive Exitval Signal  Command
1       host1        1403689277.067  0.519999980926514       0       0       0      0       uptime

不管有多少我的主机添加到-S,我似乎只能得到最后一个完成到test.log中

No matter how many hosts I add to -S, I seem to only get the last one to complete into test.log

我在这里增加了一个后续问题: GNU并行 - - -joblog只记录的最后一份工作

I've added a follow-up question here: GNU Parallel - --joblog only logging last job

推荐答案

您想要使用的 - joblog 选项,如在文档中。 GNU平行甚至允许重新启动刚失败的人与 - 恢复 - 无法

例如,运行此脚本:

#!/bin/bash
jobmod=$(( $1 % 3 ))
if [ $jobmod == 0 ]
then
    exit 1
else
    exit 0
fi 

这几台主机是这样的:

$ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob 

$ more out.log
Seq Host    Starttime   Runtime Send    Receive Exitval Signal  Command
1   srv01   1403542514.713  0.267   0   0   0   0   ./failjob 1
3   srv02   1403542514.717  0.266   0   0   1   0   ./failjob 3
4   srv03   1403542514.719  0.266   0   0   0   0   ./failjob 4
2   srv04   1403542514.715  0.397   0   0   0   0   ./failjob 2
5   srv01   1403542514.983  0.231   0   0   0   0   ./failjob 5
6   srv02   1403542514.986  0.368   0   0   1   0   ./failjob 6
7   srv03   1403542514.988  0.388   0   0   0   0   ./failjob 7
8   srv04   1403542515.121  0.437   0   0   0   0   ./failjob 8
9   srv01   1403542515.221  0.343   0   0   1   0   ./failjob 9
10  srv02   1403542515.356  0.388   0   0   0   0   ./failjob 10

这篇关于GNU并行 - 其作业失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆