在 bash 中并行运行有限数量的子进程? [英] Running a limited number of child processes in parallel in bash?

查看:16
本文介绍了在 bash 中并行运行有限数量的子进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大量文件需要进行一些繁重的处理.这种单线程处理使用几百 MiB 的 RAM(在用于启动作业的机器上)并且需要几分钟才能运行.我当前的用例是在输入数据上启动 hadoop 作业,但我之前在其他情况下也遇到过同样的问题.

I have a large set of files for which some heavy processing needs to be done. This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run. My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.

为了充分利用可用的 CPU 能力,我希望能够并行运行多个这些任务.

In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.

然而,像这样的一个非常简单的示例 shell 脚本会由于过多的负载和交换而破坏系统性能:

However a very simple example shell script like this will trash the system performance due to excessive load and swapping:

find . -type f | while read name ; 
do 
   some_heavy_processing_command ${name} &
done

所以我想要的本质上类似于gmake -j4"所做的.

So what I want is essentially similar to what "gmake -j4" does.

我知道 bash 支持wait"命令,但它只会等到所有子进程都完成.过去,我创建了执行ps"命令的脚本,然后按名称 grep 子进程(是的,我知道......丑陋).

I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).

做我想做的最简单/最干净/最好的解决方案是什么?

What is the simplest/cleanest/best solution to do what I want?

感谢 Frederik:是的,这确实是 如何限制 bash 函数中使用的线程/子进程的数量xargs --max-procs=4"就像一个魅力.(所以我投票结束了我自己的问题)

Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash The "xargs --max-procs=4" works like a charm. (So I voted to close my own question)

推荐答案

#! /usr/bin/env bash

set -o monitor 
# means: run background processes in a separate processes...
trap add_next_job CHLD 
# execute add_next_job when we receive a child complete signal

todo_array=($(find . -type f)) # places output into an array

index=0
max_jobs=2

function add_next_job {
    # if still jobs to do then add one
    if [[ $index -lt ${#todo_array[*]} ]]
    # apparently stackoverflow doesn't like bash syntax
    # the hash in the if is not a comment - rather it's bash awkward way of getting its length
    then
        echo adding job ${todo_array[$index]}
        do_job ${todo_array[$index]} & 
        # replace the line above with the command you want
        index=$(($index+1))
    fi
}

function do_job {
    echo "starting job $1"
    sleep 2
}

# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
    add_next_job
done

# wait for all jobs to complete
wait
echo "done"

已经说过 Fredrik 提出了 xargs 完全符合您的要求...

Having said that Fredrik makes the excellent point that xargs does exactly what you want...

这篇关于在 bash 中并行运行有限数量的子进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆