在bash并行运行的子进程数量有限? [英] Running a limited number of child processes in parallel in bash?

查看:158
本文介绍了在bash并行运行的子进程数量有限?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的文件集的,需要做一些繁重的处理。
这种处理在单线程的,使用几百MIB的RAM(用于启动该作业的计算机上),并需要几分钟的时间来运行。
我现在用例是开始对输入数据Hadoop的工作,但我已经在之前其他情况下,这个相同的问题。

为了充分利用我希望能够运行多个任务,那些在paralell可用的CPU资源。

然而这样一个非常简单的例子shell脚本将垃圾的系统性能,由于过载和交换:

 找到。型F |而阅读的名字;

   some_heavy_processing_command $ {name}的&安培;
DONE

所以,我要的是本质上类似于使用gmake -j4呢。

我知道的bash支持等待命令,但只有等待,直到所有子进程已经完成。在过去,我已经创建的脚本,做了PS命令,然后用grep子进程的名字来了(是的,我知道......难看)。

什么是做我想做的最简单/清洁/最好的解决办法?


编辑:感谢弗雷德里克:是的的确这是<一个副本href=\"http://stackoverflow.com/questions/6511884/how-to-limit-number-of-threads-used-in-a-function-in-bash\">How以限制在bash函数使用的线程数
在xargs的--max-特效= 4的工作原理就像一个魅力。
(所以我投来关闭自己的问题)


解决方案

 #!在/ usr /斌/ bash的ENV设置-o显示器
#是指:在一个单独的进程中运行后台进程...
陷阱add_next_job CHLD
#执行add_next_job当我们收到一个孩子完整的信号todo_array =($(找到。型F))#地输出到一个数组指数= 0
max_jobs = 2功能add_next_job {
    #如果还是工作要做,然后添加一个
    如果[[$指数-lt $ {#todo_array [*]}]
    #计算器显然不喜欢的bash语法
    #在该散列值是否不评论 - 而这是获得它的长度bash的尴尬方式
    然后
        回声增加就业$ {todo_array [$指数]}
        do_job $ {todo_array [$指数]}&安培;
        #你想要的命令替换上面的行
        指数= $(($指数+ 1))
    科幻
}功能do_job {
    回声开始的工作$ 1
    睡眠2
}#添加初始设置工作
而[[$指数-lt $ max_jobs]

    add_next_job
DONE#等待所有作业完成
等待
回声完成

尽管如此弗雷德里克使优秀一点,xargs的不正是你想要什么......

I have a large set of files for which some heavy processing needs to be done. This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run. My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.

In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.

However a very simple example shell script like this will trash the system performance due to excessive load and swapping:

find . -type f | while read name ; 
do 
   some_heavy_processing_command ${name} &
done

So what I want is essentially similar to what "gmake -j4" does.

I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).

What is the simplest/cleanest/best solution to do what I want?


Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads used in a function in bash The "xargs --max-procs=4" works like a charm. (So I voted to close my own question)

解决方案

#! /usr/bin/env bash

set -o monitor 
# means: run background processes in a separate processes...
trap add_next_job CHLD 
# execute add_next_job when we receive a child complete signal

todo_array=($(find . -type f)) # places output into an array

index=0
max_jobs=2

function add_next_job {
    # if still jobs to do then add one
    if [[ $index -lt ${#todo_array[*]} ]]
    # apparently stackoverflow doesn't like bash syntax
    # the hash in the if is not a comment - rather it's bash awkward way of getting its length
    then
        echo adding job ${todo_array[$index]}
        do_job ${todo_array[$index]} & 
        # replace the line above with the command you want
        index=$(($index+1))
    fi
}

function do_job {
    echo "starting job $1"
    sleep 2
}

# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
    add_next_job
done

# wait for all jobs to complete
wait
echo "done"

Having said that Fredrik makes the excellent point that xargs does exactly what you want...

这篇关于在bash并行运行的子进程数量有限?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆