如何在bash中限制进程的并行化for-loop [英] How to parallelize for-loop in bash limiting number of processes

查看:187
本文介绍了如何在bash中限制进程的并行化for-loop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似于bash的脚本:

  NUM_PROCS = $ 1 
NUM_ITERS = $ 2
$ ((i = 0; i <$ NUM_ITERS; i ++))的b $ b;做
python foo.py $ i arg2&
完成

将最大并行进程数限制为NUM_PROCS的最简单方法是什么?如果可能的话,我正在寻找一个不需要包/安装/模块(如GNU并行)的解决方案。



当我尝试Charles Duffy的最新方法时,下面的错误是来自bash -x的:

  + python run.py args 1 
+ python run.py .. 3
+ python run.py ... 4
+ python run.py ... 2
+ read -r line
+ python run.py ... 1
+ read -r line
+ python run.py ... 4
+ read -r line
+ python run.py ... 2
+ read -r line
+ python run.py ... 3
+ read -r line
+ python run.py ... 0
+ read -r line

...继续使用0到5之间的其他数字,直到系统启动了太多的进程来处理bash脚本被关闭了。

解决方案

作为一个非常简单的实现,取决于bash的新版本, c $ c> wait -n (等待直到下一个工作退出,而不是等待所有 工作):

 #!/ bin / bash 
#^^ ^^ - 不是/ bin / sh! ((i = 0; i $ b $ num_procs = $ 1
num_iters = $ 2

阅读时做
-r -a curr_jobs< <(jobs -p -r)\
&& (($ {#curr_jobs [@]}> = num_procs));做
等待-n
完成
python foo.py$ iarg2&如果在没有的shell上运行,请等待-n




$ b $> / code>,可以(非常低效地)用诸如 sleep 0.2 之类的命令替换它,以轮询每秒的1/5秒。





由于您实际上是从文件读取输入,另一种方法是启动N个子进程,每个进程只有(linenum%N == threadnum)

  num_procs = $ 1 
infile = $ 2
for((i = 0; i< num_procs; i ++));

while read -r line; do
echoThread $ i:processing $ line
done<<(awk -v num_procs =$ num_procs -vi =$ i\
'NR%num_procs == i {print}'<$ infile)
)&
完成
等待#等待所有$ num_procs子进程完成


I have a bash script similar to:

NUM_PROCS=$1
NUM_ITERS=$2

for ((i=0; i<$NUM_ITERS; i++)); do
    python foo.py $i arg2 &
done

What's the most straightforward way to limit the number of parallel processes to NUM_PROCS? I'm looking for a solution that doesn't require packages/installations/modules (like GNU Parallel) if possible.

When I tried Charles Duffy's latest approach, I got the following error from bash -x:

+ python run.py args 1
+ python run.py ... 3
+ python run.py ... 4
+ python run.py ... 2
+ read -r line
+ python run.py ... 1
+ read -r line
+ python run.py ... 4
+ read -r line
+ python run.py ... 2
+ read -r line
+ python run.py ... 3
+ read -r line
+ python run.py ... 0
+ read -r line

... continuing with other numbers between 0 and 5, until too many processes were started for the system to handle and the bash script was shut down.

解决方案

As a very simple implementation, depending on a version of bash new enough to have wait -n (to wait until only the next job exits, as opposed to waiting for all jobs):

#!/bin/bash
#      ^^^^ - NOT /bin/sh!

num_procs=$1
num_iters=$2

for ((i=0; i<num_iters; i++)); do
  while read -r -a curr_jobs < <(jobs -p -r) \
        && (( ${#curr_jobs[@]} >= num_procs )); do
    wait -n
  done
  python foo.py "$i" arg2 &
done

If running on a shell without wait -n, one can (very inefficiently) replace it with a command such as sleep 0.2, to poll every 1/5th of a second.


Since you're actually reading input from a file, another approach is to start N subprocesses, each of processes only lines where (linenum % N == threadnum):

num_procs=$1
infile=$2
for ((i=0; i<num_procs; i++)); do
  (
    while read -r line; do
      echo "Thread $i: processing $line"
    done < <(awk -v num_procs="$num_procs" -v i="$i" \
                 'NR % num_procs == i { print }' <"$infile")
  ) &
done
wait # wait for all $num_procs subprocesses to finish

这篇关于如何在bash中限制进程的并行化for-loop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆