for循环bash脚本并行 [英] for loop bash scripts parallel

查看:125
本文介绍了for循环bash脚本并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试一个循环脚本,并且如果可能的话,每个循环都并行运行.

I'm attempting a loop script, and each loop run in parallel if possible.

#!/bin/bash

for ip in $(cat ./IPs); do
ping -n -c 2 -W 1 ${ip} >> Results/${ip}.log
done

最终,我想将所需的内容放入循环中,并对其进行多进程处理.我已经尝试了其他示例,但似乎无法使其按预期工作.如果可以的话,我也安装了parallel.

Ultimately, I'd like to place whatever I need in the loop, and have it multi-process. I've attempted to work through the other examples, but just can't seem to get it to work as expected. I have parallel installed as well if that's an option.

推荐答案

对此的简单版本-

while read ip
do  ping -n -c 2 -W 1 ${ip} >> Results/${ip}.log 2>&1 &
done < IPs

最后的&将其置于后台,并让循环在处理该循环时运行下一个迭代.另外,我还将每个stderr重定向到相同的日志(2>&1),这样它们就不会因失败而丢失.

The & on the end puts it in background and lets the loop run the next iteration while that one is processing. As an added point, I also redirected the stderr of each to the same log (2>&1) so they wouldn't get lost of something failed.

$: ls x a # x exists, a doesn't
ls: cannot access 'a': No such file or directory
x
$: ls x a > log # send stdout to log, but error still goes to console
ls: cannot access 'a': No such file or directory
$: cat log # log only has success message
x
$: ls x a > log 2>&1 # send stderr where stdout is going - to same log
$: cat log # now both messages in the log
ls: cannot access 'a': No such file or directory
x

为了避免在for中使用cat,我也切换到了while read,但这主要是风格方面的偏好.

I also switched to a while read to avoid needing the cat in the for, but that's mostly stylistic preference.

我制作了一个含蓄的控制文件,每行只有一个字母-

I made a implistic control file that just has a letter per line -

$: cat x
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

然后声明了几个值-我希望它立即触发一个最大值,以及一个计数器.

Then declared a couple of values - a max I want it to fire at once, and a counter.

$: declare -i cnt=0 max=10

然后,我输入读取循环以遍历值,然后一次运行一个集合.在累积指定的最大值之前,它将一直在后台添加进程并对其进行计数.一旦足够,它将等待计数器完成并重置计数器,然后再继续执行另一组操作.

Then I typed in a read loop to iterate over the values, and run a set at a time. Until it accumulates the stated max, it keeps adding processes in background and counting them. Once it gets enough, it waits for those to finish and resets the counter before continuing with another set.

$: while read ctl             # these would be your IP's
> do if (( cnt++ < max ))     # this checks for max load
>    then echo starting $ctl  # report which we're doing
>         date                # throw a timestamp
>         sleep 10 &          # and fire the task in background
>    else echo letting that batch work... # when too many running
>         cnt=0               # reset the counter
>         wait                # and thumb-twiddle till they all finish
>         echo continuing     # log
>         date                # and timestamp
>    fi
> done < x                    # the whole loop reads from x until done

这是输出.

starting a
Thu, Oct 25, 2018  8:13:34 AM
[1] 10436
starting b
Thu, Oct 25, 2018  8:13:34 AM
[2] 7544
starting c
Thu, Oct 25, 2018  8:13:34 AM
[3] 10296
starting d
Thu, Oct 25, 2018  8:13:34 AM
[4] 6244
starting e
Thu, Oct 25, 2018  8:13:34 AM
[5] 8560
starting f
Thu, Oct 25, 2018  8:13:35 AM
[6] 8824
starting g
Thu, Oct 25, 2018  8:13:35 AM
[7] 11640
starting h
Thu, Oct 25, 2018  8:13:35 AM
[8] 9856
starting i
Thu, Oct 25, 2018  8:13:35 AM
[9] 7612
starting j
Thu, Oct 25, 2018  8:13:35 AM
[10] 9100
letting that batch work...
[1]   Done                    sleep 10
[2]   Done                    sleep 10
[3]   Done                    sleep 10
[4]   Done                    sleep 10
[5]   Done                    sleep 10
[6]   Done                    sleep 10
[7]   Done                    sleep 10
[8]   Done                    sleep 10
[9]-  Done                    sleep 10
[10]+  Done                    sleep 10
continuing
Thu, Oct 25, 2018  8:13:45 AM
starting l
Thu, Oct 25, 2018  8:13:45 AM
[1] 8600
starting m
Thu, Oct 25, 2018  8:13:45 AM
[2] 516
starting n
Thu, Oct 25, 2018  8:13:45 AM
[3] 3296
starting o
Thu, Oct 25, 2018  8:13:45 AM
[4] 8608
starting p
Thu, Oct 25, 2018  8:13:46 AM
[5] 4040
starting q
Thu, Oct 25, 2018  8:13:46 AM
[6] 7476
starting r
Thu, Oct 25, 2018  8:13:46 AM
[7] 4468
starting s
Thu, Oct 25, 2018  8:13:46 AM
[8] 4144
starting t
Thu, Oct 25, 2018  8:13:46 AM
[9] 8956
starting u
Thu, Oct 25, 2018  8:13:46 AM
[10] 6864
letting that batch work...
[1]   Done                    sleep 10
[2]   Done                    sleep 10
[3]   Done                    sleep 10
[4]   Done                    sleep 10
[5]   Done                    sleep 10
[6]   Done                    sleep 10
[7]   Done                    sleep 10
[8]   Done                    sleep 10
[9]-  Done                    sleep 10
[10]+  Done                    sleep 10
continuing
Thu, Oct 25, 2018  8:13:56 AM
starting w
Thu, Oct 25, 2018  8:13:56 AM
[1] 5520
starting x
Thu, Oct 25, 2018  8:13:56 AM
[2] 6436
starting y
Thu, Oct 25, 2018  8:13:57 AM
[3] 12216
starting z
Thu, Oct 25, 2018  8:13:57 AM
[4] 8468

完成后,最后几个仍在运行,因为我没有麻烦通过细致的检查将所有内容写到实际脚本中.

And when finished, the last few are still running because I didn't go to the trouble of writing all this to an actual script with meticulous checking.

$: ps
      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
    11012   10944   11012      11040  pty0     2136995 07:59:35 /usr/bin/bash
     6436   11012    6436       9188  pty0     2136995 08:13:56 /usr/bin/sleep
     5520   11012    5520      10064  pty0     2136995 08:13:56 /usr/bin/sleep
    12216   11012   12216      12064  pty0     2136995 08:13:57 /usr/bin/sleep
     8468   11012    8468      10100  pty0     2136995 08:13:57 /usr/bin/sleep
     9096   11012    9096      10356  pty0     2136995 08:14:03 /usr/bin/ps

这确实会导致突发负载(对于并非全部同时完成的任务)会逐渐减少,直到最后一次完成,从而导致尖峰和停顿.再加一点技巧,我们可以编写一个waitpid陷阱,该陷阱将在每次完成一项新任务时触发新任务以保持负载稳定,但这是另一天的练习,除非有人真的想看到它. (我以前是在Perl中做过的,并且总是想在bash中实现它,只是因为...)

This does cause burst loads that (for tasks that don't all finish at about the same time) will dwindle till the last is done, causing spikes and lulls. With a little more finesse we could write a waitpid trap that would fire a new job each time one finished to keep the load steady, but that's an exercise for another day unless someone just really wants to see it. (I did it in Perl before, and have kind of always wanted to implement it in bash just because...)

因为已请求-

很明显,正如在其他帖子中所介绍的那样,您可以只使用parallel ...,但是作为练习,这是设置可以从队列中读取的多个流程链的一种方法.我选择了简单的回调方式,而不是处理SIGCHLD陷阱,因为有很多小子程序在飞来飞去……

Obviously, as presented in other posts, you could just use parallel... but as an exercise, here's one way you could set a number of process chains that would read from a queue. I opted for simple callback rather than dealing with a SIGCHLD trap because there are a lot of little subprocs flying around...

如果有人在意的话,可以进行精修.

Refinements welcome if anyone cares.

#! /bin/env bash

trap 'echo abort $0@$LINENO; die; exit 1' ERR       # make sure any error is fatal
declare -i primer=0          # a countdown of how many processes to pre-spawn
use="
  $0 <#procs> <cmdfile>

  Pass the number of desired processes to prespawn as the 1st argument.
  Pass the command file with the list of tasks you need done.

  Command file format:
   KEYSTRING:cmdlist

  where KEYSTRING will be used as a unique logfile name
  and   cmdlist   is the base command string to be run

"

die() {
   echo "$use" >&2
   return 1
}

case $# in
2) primer=$1
   case "$primer" in
   *[^0-9]*) echo "INVALID #procs '$primer'"
             die;;
   esac
   cmdfile=$2
   [[ -r "$cmdfile" ]] || die
   declare -i lines=$( grep -c . $cmdfile)
   if (( lines < primer ))
   then echo "Note - command lines in $cmdfile ($lines) fewer than requested process chains ($primer)"
        die
   fi ;;
*) die ;;
esac >&2

trap ': no-op to ignore' HUP  # ignore hangups (built-in nohup without explicit i/o redirection)

spawn() {
  IFS="$IFS:" read key cmd || return
  echo "$(date) executing '$cmd'; c.f. $key.log" | tee $key.log
  echo "# autogenerated by $0 $(date)
   { $cmd
     spawn
   } >> $key.log 2>&1 &
  " >| $key.sh
  . $key.sh
  rm -f $key.sh
  return 0
}

while (( primer-- ))  # until we've filled the requested quota
do spawn              # create a child process
done < $cmdfile

是的,在读取可能肮脏的数据并将其来源时存在安全性问题.我想保持框架简单易做.仍然欢迎您的建议.

Yes, there are security concerns with reading possibly dirty data and sourcing it. I wanted to keep the framework simple as an exercise. Suggestions are still welcome.

我将快速命令文件与一些由简单废话构建的复杂命令放在一起,作为示例.

I threw together a quick command file with some complex commands built of simple crap just as examples.

a:for x in $( seq 1 10 );do echo "on $x";date;sleep 1;done &
b:true && echo ok || echo no
c:false && echo ok || echo no
d:date > /tmp/x; cat /tmp/x
e:date;sleep 5;date
f:date;sleep 13;date
g:date;sleep 1;date
h:date;sleep 5;date
i:date;sleep 17;date
j:date;sleep 1;date
k:date;sleep 9;date
l:date;sleep 19;date
m:date;sleep 7;date
n:date;sleep 19;date
o:date;sleep 11;date
p:date;sleep 17;date
q:date;sleep 6;date
r:date;sleep 7;date
s:date;sleep 18;date
t:date;sleep 6;date
u:date;sleep 9;date
v:date;sleep 9;date
w:date;sleep 2;date
x:date;sleep 0;date
y:date;sleep 3;date
z:date;sleep 10;date

请注意,第一个甚至在后台运行-后台处理程序无关紧要.作业a将在spool之前开始b,因此它将跳到c.

Note the first one even runs itself in background - the spooler doesn't care. Job a will start b before spool can, so it will skip to c.

一些日志-

a-原始生成;在后台运行并立即启动b,然后继续记录

a - original spawn; ran itself in background and immediately started b, then kept logging

Thu, Oct 25, 2018  2:33:57 PM executing 'for x in $( seq 1 10 );do echo "on $x";date;sleep 1;done &'; c.f. a.log
on 1
Thu, Oct 25, 2018  2:33:58 PM executing 'true && echo ok || echo no'; c.f. b.log
Thu, Oct 25, 2018  2:33:58 PM
on 2
Thu, Oct 25, 2018  2:33:59 PM
on 3
Thu, Oct 25, 2018  2:34:00 PM
on 4
Thu, Oct 25, 2018  2:34:01 PM
on 5
Thu, Oct 25, 2018  2:34:02 PM
on 6
Thu, Oct 25, 2018  2:34:04 PM
on 7
Thu, Oct 25, 2018  2:34:05 PM
on 8
Thu, Oct 25, 2018  2:34:06 PM
on 9
Thu, Oct 25, 2018  2:34:07 PM
on 10
Thu, Oct 25, 2018  2:34:08 PM

b-快速退出并开始f,因为c,d和& e已经运行了

b - exited quickly and started f because c, d, & e had already been run

Thu, Oct 25, 2018  2:33:58 PM executing 'true && echo ok || echo no'; c.f. b.log
ok
Thu, Oct 25, 2018  2:33:58 PM executing 'date;sleep 13;date'; c.f. f.log

c-原始生成;在b之前完成,所以它从d开始,这就是为什么b从f开始的原因

c - original spawn; finished before b, so it started d, which is why b started f

Thu, Oct 25, 2018  2:33:58 PM executing 'false && echo ok || echo no'; c.f. c.log
no
Thu, Oct 25, 2018  2:33:58 PM executing 'date > /tmp/x; cat /tmp/x'; c.f. d.log

d-以c开头,由于g已经运行,所以以h开头并结束

d - started by c, finished and started h because g had already been run

Thu, Oct 25, 2018  2:33:58 PM executing 'date > /tmp/x; cat /tmp/x'; c.f. d.log
Thu, Oct 25, 2018  2:33:58 PM
Thu, Oct 25, 2018  2:33:59 PM executing 'date;sleep 5;date'; c.f. h.log

e-原始生成物,从n开始,因为直到此为止所有内容都已运行

e - original spawn, started n because everything up to that had been run

Thu, Oct 25, 2018  2:33:58 PM executing 'date;sleep 5;date'; c.f. e.log
Thu, Oct 25, 2018  2:33:58 PM
Thu, Oct 25, 2018  2:34:04 PM
Thu, Oct 25, 2018  2:34:04 PM executing 'date;sleep 19;date'; c.f. n.log

(向前跳...)

n-由e开始,花了足够长的时间才能完成,没有更多的任务可以开始

n - started by e, took long enough to finish there were no more tasks to start

Thu, Oct 25, 2018  2:34:04 PM executing 'date;sleep 19;date'; c.f. n.log
Thu, Oct 25, 2018  2:34:04 PM
Thu, Oct 25, 2018  2:34:23 PM

有效.它不是完美的,但可能会派上用场. :)

It works. It isn't perfect, but it could be handy. :)

这篇关于for循环bash脚本并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆