如何在多处理器系统上生成并行子进程? [英] How to spawn parallel child processes on a multi-processor system?

查看:50
本文介绍了如何在多处理器系统上生成并行子进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python脚本,希望用作另一个Python脚本的控制器.我有一台具有64个处理器的服务器,因此想产生第二个Python脚本的64个子进程.子脚本称为:

I have a Python script that I want to use as a controller to another Python script. I have a server with 64 processors, so want to spawn up to 64 child processes of this second Python script. The child script is called:

$ python create_graphs.py --name=NAME

其中NAME类似于XYZ,ABC,NYU等.

where NAME is something like XYZ, ABC, NYU etc.

在我的父控制器脚本中,我从列表中检索名称变量:

In my parent controller script I retrieve the name variable from a list:

my_list = [ 'XYZ', 'ABC', 'NYU' ]

所以我的问题是,以孩子的身份产生这些过程的最佳方法是什么?我想一次将子级的数量限制为64个,因此需要跟踪状态(无论子级进程是否完成),以便有效地保持整个世代的运行.

So my question is, what is the best way to spawn off these processes as children? I want to limit the number of children to 64 at a time, so need to track the status (if the child process has finished or not) so I can efficiently keep the whole generation running.

我研究了使用子进程包,但拒绝了它,因为它一次只生一个孩子.我终于找到了多处理器程序包,但是我承认整个线程和子进程文档不堪重负.

I looked into using the subprocess package, but rejected it because it only spawns one child at a time. I finally found the multiprocessor package, but I admit to being overwhelmed by the whole threads vs. subprocesses documentation.

现在,我的脚本使用subprocess.call一次仅生一个孩子,看起来像这样:

Right now, my script uses subprocess.call to only spawn one child at a time and looks like this:

#!/path/to/python
import subprocess, multiprocessing, Queue
from multiprocessing import Process

my_list = [ 'XYZ', 'ABC', 'NYU' ]

if __name__ == '__main__':
    processors = multiprocessing.cpu_count()

    for i in range(len(my_list)):
        if( i < processors ):
             cmd = ["python", "/path/to/create_graphs.py", "--name="+ my_list[i]]
             child = subprocess.call( cmd, shell=False )

我真的希望它能一次生下64个孩子.在其他stackoverflow问题中,我看到人们在使用Queue,但这似乎对性能产生了影响?

I really want it to spawn up 64 children at a time. In other stackoverflow questions I saw people using Queue, but it seems like that creates a performance hit?

推荐答案

您正在寻找的是下面是一个计算示例,使它更易于理解.以下将在N个进程上划分10000个任务,其中N是cpu计数.请注意,我将None传递为进程数.这将导致Pool类将cpu_count用于进程数(参考)

And here is a calculation example to make it easier to understand. The following will divide 10000 tasks on N processes where N is the cpu count. Note that I'm passing None as the number of processes. This will cause the Pool class to use cpu_count for the number of processes (reference)

import multiprocessing
import subprocess

def calculate(value):
    return value * 10

if __name__ == '__main__':
    pool = multiprocessing.Pool(None)
    tasks = range(10000)
    results = []
    r = pool.map_async(calculate, tasks, callback=results.append)
    r.wait() # Wait on the results
    print results

这篇关于如何在多处理器系统上生成并行子进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆