在bash并行运行的子进程数量有限? [英] Running a limited number of child processes in parallel in bash?
问题描述
我有一个大的文件集的,需要做一些繁重的处理。
这种处理在单线程的,使用几百MIB的RAM(用于启动该作业的计算机上),并需要几分钟的时间来运行。
我现在用例是开始对输入数据Hadoop的工作,但我已经在之前其他情况下,这个相同的问题。
为了充分利用我希望能够运行多个任务,那些在paralell可用的CPU资源。
然而这样一个非常简单的例子shell脚本将垃圾的系统性能,由于过载和交换:
找到。型F |而阅读的名字;
做
some_heavy_processing_command $ {name}的&安培;
DONE
所以,我要的是本质上类似于使用gmake -j4呢。
我知道的bash支持等待命令,但只有等待,直到所有子进程已经完成。在过去,我已经创建的脚本,做了PS命令,然后用grep子进程的名字来了(是的,我知道......难看)。
什么是做我想做的最简单/清洁/最好的解决办法?
编辑:感谢弗雷德里克:是的的确这是<一个副本href=\"http://stackoverflow.com/questions/6511884/how-to-limit-number-of-threads-used-in-a-function-in-bash\">How以限制在bash函数使用的线程数
在xargs的--max-特效= 4的工作原理就像一个魅力。
(所以我投来关闭自己的问题)
#!在/ usr /斌/ bash的ENV设置-o显示器
#是指:在一个单独的进程中运行后台进程...
陷阱add_next_job CHLD
#执行add_next_job当我们收到一个孩子完整的信号todo_array =($(找到。型F))#地输出到一个数组指数= 0
max_jobs = 2功能add_next_job {
#如果还是工作要做,然后添加一个
如果[[$指数-lt $ {#todo_array [*]}]
#计算器显然不喜欢的bash语法
#在该散列值是否不评论 - 而这是获得它的长度bash的尴尬方式
然后
回声增加就业$ {todo_array [$指数]}
do_job $ {todo_array [$指数]}&安培;
#你想要的命令替换上面的行
指数= $(($指数+ 1))
科幻
}功能do_job {
回声开始的工作$ 1
睡眠2
}#添加初始设置工作
而[[$指数-lt $ max_jobs]
做
add_next_job
DONE#等待所有作业完成
等待
回声完成
尽管如此弗雷德里克使优秀一点,xargs的不正是你想要什么......
I have a large set of files for which some heavy processing needs to be done. This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run. My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.
In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.
However a very simple example shell script like this will trash the system performance due to excessive load and swapping:
find . -type f | while read name ;
do
some_heavy_processing_command ${name} &
done
So what I want is essentially similar to what "gmake -j4" does.
I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).
What is the simplest/cleanest/best solution to do what I want?
Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads used in a function in bash The "xargs --max-procs=4" works like a charm. (So I voted to close my own question)
#! /usr/bin/env bash
set -o monitor
# means: run background processes in a separate processes...
trap add_next_job CHLD
# execute add_next_job when we receive a child complete signal
todo_array=($(find . -type f)) # places output into an array
index=0
max_jobs=2
function add_next_job {
# if still jobs to do then add one
if [[ $index -lt ${#todo_array[*]} ]]
# apparently stackoverflow doesn't like bash syntax
# the hash in the if is not a comment - rather it's bash awkward way of getting its length
then
echo adding job ${todo_array[$index]}
do_job ${todo_array[$index]} &
# replace the line above with the command you want
index=$(($index+1))
fi
}
function do_job {
echo "starting job $1"
sleep 2
}
# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
add_next_job
done
# wait for all jobs to complete
wait
echo "done"
Having said that Fredrik makes the excellent point that xargs does exactly what you want...
这篇关于在bash并行运行的子进程数量有限?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!