GNU并行 - 其作业失败? [英] GNU Parallel - which job failed?
问题描述
我在几个不同服务器(最多25个),使用GNU并行运行的作业。
I'm running a job on several different servers (up to 25) using GNU parallel.
它实现了shell脚本目前这确实
The shell script which implements this currently does:
parallel --tag --nonall -S $some_list_of_servers "some_command"
state=$?
echo -n "RESULT: "
if [ "$state" -eq "0" ]
then
echo "All jobs successful"
else
echo "$state jobs failed"
fi
return $state
其中some_list_of_servers是一个数组,并install_command是,例如,混帐取。
where some_list_of_servers is an array, and install_command is, for instance, git fetch.
我要的不仅仅是多少工作没有更多的信息。我想知道哪些命令,哪个服务器,失败。
What I want is LOT more information than just how many jobs failed. I want to know which command, and which server, failed.
我已经通过手册页,而谷歌和SO但无法找到我要找的开关(ES)。
I've been through the man page, and google, and SO but can't find the switch(es) that I'm looking for.
任何帮助感激AP preciated。
Any help gratefully appreciated.
WeeDom
编辑回应答1:
我试过了,和一些奇怪的事情正在发生。
I tried that, and something odd is happening.
weedom@host1: ~/$ parallel --tag --nonall -j8 --joblog test.log -S host1,host2 uptime
host2 10:41:17 up 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00
host1 10:41:17 up 22:34, 3 users, load average: 0.06, 0.11, 0.04
weedom@host1: ~/$ cat test.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 host1 1403689277.067 0.519999980926514 0 0 0 0 uptime
不管有多少我的主机添加到-S,我似乎只能得到最后一个完成到test.log中
No matter how many hosts I add to -S, I seem to only get the last one to complete into test.log
我在这里增加了一个后续问题: GNU并行 - - -joblog只记录的最后一份工作
I've added a follow-up question here: GNU Parallel - --joblog only logging last job
推荐答案
您想要使用的 - joblog
选项,如在文档中。 GNU平行甚至允许重新启动刚失败的人与 - 恢复 - 无法
例如,运行此脚本:
#!/bin/bash
jobmod=$(( $1 % 3 ))
if [ $jobmod == 0 ]
then
exit 1
else
exit 0
fi
这几台主机是这样的:
$ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob
给
$ more out.log
Seq Host Starttime Runtime Send Receive Exitval Signal Command
1 srv01 1403542514.713 0.267 0 0 0 0 ./failjob 1
3 srv02 1403542514.717 0.266 0 0 1 0 ./failjob 3
4 srv03 1403542514.719 0.266 0 0 0 0 ./failjob 4
2 srv04 1403542514.715 0.397 0 0 0 0 ./failjob 2
5 srv01 1403542514.983 0.231 0 0 0 0 ./failjob 5
6 srv02 1403542514.986 0.368 0 0 1 0 ./failjob 6
7 srv03 1403542514.988 0.388 0 0 0 0 ./failjob 7
8 srv04 1403542515.121 0.437 0 0 0 0 ./failjob 8
9 srv01 1403542515.221 0.343 0 0 1 0 ./failjob 9
10 srv02 1403542515.356 0.388 0 0 0 0 ./failjob 10
这篇关于GNU并行 - 其作业失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!