python,子进程:当一个(在一组中)终止时启动新进程 [英] python, subprocess: launch new process when one (in a group) has terminated

查看:35
本文介绍了python,子进程:当一个(在一组中)终止时启动新进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 n 个文件可以使用相同的 Python 脚本 analysis.py 单独且相互独立地进行分析.在包装脚本 wrapper.py 中,我遍历这些文件并调用 analysis.py 作为带有 subprocess.Popen 的单独进程:

I have n files to analyze separately and independently of each other with the same Python script analysis.py. In a wrapper script, wrapper.py, I loop over those files and call analysis.py as a separate process with subprocess.Popen:

for a_file in all_files:
    command = "python analysis.py %s" % a_file
    analysis_process = subprocess.Popen(
                                            shlex.split(command),
                                            stdout=subprocess.PIPE,
                                            stderr=subprocess.PIPE)
    analysis_process.wait()

现在,我想使用我机器的所有 k 个 CPU 内核,以加快整个分析的速度.有没有办法总是让 k-1 运行进程,只要有文件要分析?

Now, I would like to use all the k CPU cores of my machine in order to speed up the whole analysis. Is there a way to always have k-1 running processes as long as there are files to analyze?

推荐答案

这里概述了如何使用 multiprocessing.Pool 正是为这些任务而存在的:

This outlines how to use multiprocessing.Pool which exists exactly for these tasks:

from multiprocessing import Pool, cpu_count

# ...
all_files = ["file%d" % i for i in range(5)]


def process_file(file_name):
    # process file
    return "finished file %s" % file_name

pool = Pool(cpu_count())

# this is a blocking call - when it's done, all files have been processed
results = pool.map(process_file, all_files)

# no more tasks can go in the pool
pool.close()

# wait for all workers to complete their task (though we used a blocking call...)
pool.join()


# ['finished file file0', 'finished file file1',  ... , 'finished file file4']
print results

添加 Joel 的评论,其中提到了一个常见的陷阱:

Adding Joel's comment mentioning a common pitfall:

确保传递给 pool.map() 的函数仅包含在模块级别定义的对象.Python 多处理使用 pickle 在进程之间传递对象,而 pickle 在嵌套作用域中定义的函数等方面存在问题.

Make sure that the function you pass to pool.map() contains only objects that are defined at the module level. Python multiprocessing uses pickle to pass objects between processes, and pickle has issues with things like functions defined in a nested scope.

关于什么的文档腌制

这篇关于python,子进程:当一个(在一组中)终止时启动新进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆