并行字典理解 [英] Parallelizing a dictionary comprehension

查看:132
本文介绍了并行字典理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下功能和字典理解:

I have the following function and dictionary comprehension:

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

dictionary = {name: function(name, params) for name in nameList}

并希望并行化。任何简单的方法来做到这一点

and would like to parallelize this. Any simple way to do this?

此处中已经注意到,可以使用多处理模块,但不能理解如何使我的结果通过我的字典。

In here I have seend that the multiprocessing module can be used, but could not understand how to make it pass my results to my dictionary.

注意:如果可能,请给出可以应用于任何返回结果的函数的答案。

NOTE: If possible, please give an answer that can be applied to any function that returns a result.

注2:主要操纵适合文件并将结果分配给课程

NOTE 2: the is mainly manipulate the fits file and assigning the results to a class

更新

所以这里对我来说最有用(从@code_onkel答案):

So here's what worked for me in the end (from @code_onkel answer):

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

def function_wrapper(args):
    return function(*args)

params = [...,...,..., etc]    

p = multiprocessing..Pool(processes=(max([2, mproc.cpu_count() // 10])))
args_generator = ((name, params) for name in names)

dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))

使用tqdm只能部分工作,因为我可以使用我的自定义栏作为tqdm恢复到只有迭代的默认栏。

using tqdm only worked partially since I could use my custom bar as tqdm reverts to a default bar with only the iterations.

推荐答案

字典理解本身不能并行化。以下是一个示例,如何使用Python 2.7中的多处理器模块。

The dictionary comprehension itself can not be parallelized. Here is an example how to use the multiprocessing module with Python 2.7.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

def function_wrapper(args):
    return function(*args)

names = list('onecharNAmEs')

p = multiprocessing.Pool(3)
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
print(dictionary)
p.close()

这适用于任何函数,尽管 multiprocssing 模块的限制,可以使用.html#programming-guidelinesrel =nofollow>。最重要的是,作为参数传递的类和返回值以及要并行化的函数必须在模块级别定义,否则(de)序列化程序将找不到它们。包装函数是必需的,因为 function()需要两个参数,但是 Pool.map()只能处理函数一个参数(作为内置的 map()函数)。

This works with any function, though the restrictions of the multiprocssing module apply. Most important, the classes passed as arguments and return values as well as the function to be parallelized itself have to be defined at the module level, otherwise the (de)serializer will not find them. The wrapper function is necessary since function() takes two arguments, but Pool.map() can only handle functions with one arguments (as the built-in map() function).

使用Python> 3.3可以简化通过使用作为上下文管理器和 starmap()函数。

Using Python >3.3 it can be simplified by using the Pool as a context manager and the starmap() function.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

names = list('onecharnamEs')

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    dictionary = dict(zip(names, p.starmap(function, args_generator)))

print(dictionary)

这是一个更可读的版本的块:

This is a more readable version of the with block:

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    results = p.starmap(function, args_generator)
    name_result_tuples = zip(names, results)
    dictionary = dict(name_result_tuples)

Pool.map()函数用于具有单个参数的函数,这就是为什么 Pool.starmap()函数在3.3中添加。

The Pool.map() function is for functions with a single argument, that's why the Pool.starmap() function was added in 3.3.

这篇关于并行字典理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆