同步 Unix 池进程的简单机制是什么? [英] What simple mechanism for synchronous Unix pooled processes?

查看:36
本文介绍了同步 Unix 池进程的简单机制是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要限制并行执行的进程数.例如我想执行这个伪命令行:

I need to limit the number of processes being executed in parallel. For instance I'd like to execute this psuedo-command line:

export POOL_PARALLELISM=4
for i in `seq 100` ; do
    pool foo -bar &
done

pool foo -bar # would not complete until the first 100 finished.

因此,尽管有 101 个 foo 排队等待运行,但在任何给定时间只有 4 个在运行.pool 将 fork()/exit() 并将剩余进程排队,直到完成.

Therefor despite 101 foos being queued up to run, only 4 would be running at any given time. pool would fork()/exit() and queue the remaining processes until complete.

有没有一种简单的机制可以用 Unix 工具来做到这一点?atbatch 不适用,因为它们通常在每分钟的顶部调用并按顺序执行作业.使用队列不一定是最好的,因为我希望这些是同步的.

Is there a simple mechanism to do this with Unix tools? at and batch don't apply because they generally invoke on the top of the minute as well as execute jobs sequentially. Using a queue is not necessarily the best because I want these synchronous.

在我编写一个使用信号量和共享内存的 C 包装器,然后调试我肯定会介绍的死锁之前,任何人都可以推荐一个 bash/shell 或其他工具机制来完成此任务.

Before I write a C wrapper employing semaphores and shared memory and then debug deadlocks that I'll surely introduce, can anyone recommend a bash/shell or other tool mechanism to accomplish this.

推荐答案

这个工具肯定没必要自己写,有几个不错的选择.

There's definitely no need to write this tool yourself, there's several good choices.

make 可以很容易地做到这一点,但它确实广泛依赖 files 来驱动该过程.(如果你想对每个产生输出文件的输入文件运行一些操作,这可能很棒.)-j 命令行选项将运行指定数量的任务,-l load-average 命令行选项将指定在开始新任务之前必须满足的系统负载平均值.(如果你想在后台"做一些工作,这可能会很好.不要忘记 nice(1) 命令,它在这里也可以提供帮助.)

make can do this pretty easy, but it does rely extensively on files to drive the process. (If you want to run some operation on every input file that produces an output file, this might be awesome.) The -j command line option will run the specified number of tasks and the -l load-average command line option will specify a system load average that must be met before starting new tasks. (Which might be nice if you wanted to do some work "in the background". Don't forget about the nice(1) command, which can also help here.)

所以,一个快速(且未经测试)的 Makefile 用于图像转换:

So, a quick (and untested) Makefile for image converting:

ALL=$(patsubst cimg%.jpg,thumb_cimg%.jpg,$(wildcard *.jpg))

.PHONY: all

all: $(ALL)
        convert $< -resize 100x100 $@

如果您使用 make 运行它,它将一次运行一个.如果您使用 make -j8 运行,它将运行八个独立的作业.如果你运行 make -j,它将启动数百个.(在编译源代码时,我发现两倍的内核数是一个很好的起点.这让每个处理器在等待磁盘 IO 请求时都有一些事情要做.不同的机器和不同的负载可能会以不同的方式工作.)

If you run this with make, it'll run one-at-a-time. If you run with make -j8, it'll run eight separate jobs. If you run make -j, it'll start hundreds. (When compiling source code, I find that twice-the-number-of-cores is an excellent starting point. That gives each processor something to do while waiting for disk IO requests. Different machines and different loads might work differently.)

xargs 提供 --max-procs 命令行选项.如果可以使用 ascii NUL 分隔的输入命令或换行符分隔的输入命令基于单个输入流来划分并行进程,则这是最好的.(好吧,-d 选项可以让您选择其他东西,但这两个很常见且很容易.)这让您受益于使用 find(1) 的强大功能文件选择语法而不是像上面的 Makefile 示例那样编写有趣的表达式,或者让您的输入与 files 完全无关.(考虑一下,如果你有一个程序可以将大的合数分解为素数——让这个任务适合 make 充其量是很尴尬的.xargs 可以很容易地做到这一点.)

xargs provides the --max-procs command line option. This is best if the parallel processes can be divided apart based on a single input stream with either ascii NUL separated input commands or new-line separated input commands. (Well, the -d option lets you pick something else, but these two are common and easy.) This gives you the benefit of using find(1)'s powerful file-selection syntax rather than writing funny expressions like the Makefile example above, or lets your input be completely unrelated to files. (Consider if you had a program for factoring large composite numbers in prime factors -- making that task fit into make would be awkward at best. xargs could do it easily.)

前面的示例可能如下所示:

The earlier example might look something like this:

find . -name '*jpg' -print0 | xargs -0 --max-procs 16 -I {} convert {} --resize 100x100 thumb_{}

并行

moreutils 软件包(至少在 Ubuntu 上可用)提供了 parallel 命令.它可以以两种不同的方式运行:在不同的参数上运行指定的命令,或者并行运行不同的命令.前面的示例可能如下所示:

parallel

The moreutils package (available at least on Ubuntu) provides the parallel command. It can run in two different ways: either running a specified command on different arguments, or running different commands in parallel. The previous example could look like this:

parallel -i -j 16 convert {} -resize 100x100 thumb_{} -- *.jpg

beanstalkd

beanstalkd 程序采用了完全不同的方法:它提供了一个消息总线供您提交请求,作业服务器阻塞正在输入的作业,执行作业,然后返回等待队列中的新作业.如果您想将数据写回启动作业的特定 HTTP 请求,这可能不是很方便,因为您必须自己提供该机制(可能在 beanstalkd 服务器上使用不同的管"),但如果最终结果是将数据提交到数据库、电子邮件或类似异步的东西,这可能是最容易集成到现有应用程序中的方法.

beanstalkd

The beanstalkd program takes a completely different approach: it provides a message bus for you to submit requests to, and job servers block on jobs being entered, execute the jobs, and then return to waiting for a new job on the queue. If you want to write data back to the specific HTTP request that initiated the job, this might not be very convenient, as you have to provide that mechanism yourself (perhaps a different 'tube' on the beanstalkd server), but if the end result is submitting data into a database, or email, or something similarly asynchronous, this might be the easiest to integrate into your existing application.

这篇关于同步 Unix 池进程的简单机制是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆