使用带有关键字参数的 multiprocessing.Pool.map() 函数? [英] Using the multiprocessing.Pool.map() function with keyword arguments?

查看:61
本文介绍了使用带有关键字参数的 multiprocessing.Pool.map() 函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将关键字参数传递给 Python 的 multiprocessing.Pool 实例中的 map 函数.

I am trying to pass the keyword arguments to the map function in Python's multiprocessing.Pool instance.

使用带有关键字参数的 map() 函数进行推断,我知道我可以使用 functools.partial() 如下:

Extrapolating from Using map() function with keyword arguments, I know I can use functools.partial() such as the following:

from multiprocessing import Pool
from functools import partial
import sys

# Function to multiprocess
def func(a, b, c, d):
    print(a * (b + 2 * c - d))
    sys.stdout.flush()

if __name__ == '__main__':
    p = Pool(2)
    # Now, I try to call func(a, b, c, d) for 10 different a values,
    # but the same b, c, d values passed in as keyword arguments
    a_iter = range(10)
    kwargs = {'b': 1, 'c': 2, 'd': 3}

    mapfunc = partial(func, **kwargs)
    p.map(mapfunc, a_iter)

输出正确:

0
2
4
6
8
10
12
14
16
18

这是最好的做法(最pythonic"的方式)吗?我觉得:

Is this the best practice (most "pythonic" way) to do so? I felt that:

1) Pool 常用;

2) 常用的关键字参数;

2) Keyword arguments are commonly used;

3) 但是像我上面的例子这样的组合用法有点像实现这一点的hacky"方式.

3) But the combined usage like my example above is a little bit like a "hacky" way to achieve this.

推荐答案

如果默认参数很大,使用 partial 可能不是最佳选择.传递给 map 的函数在发送给 worker 时被重复 pickle 处理(对于迭代中的每个参数一次);全局 Python 函数(本质上)通过发送限定名称(因为在另一侧定义了相同的函数而无需传输任何数据)进行 pickle-ed,而 partialpickle-ed 作为函数的 pickle 和所有提供的参数.

Using partial may be suboptimal if the default arguments are large. The function passed to map is repeatedly pickle-ed when sent to the workers (once for every argument in the iterable); a global Python function is (essentially) pickle-ed by sending the qualified name (because the same function is defined on the other side without needing to transfer any data) while partial is pickle-ed as the pickle of the function and all the provided arguments.

如果 kwargs 都是小的原语,就像你的例子一样,这并不重要;发送额外参数的增量成本是微不足道的.但是如果 kwargs 很大,比如 kwargs = {'b': [1] * 10000, 'c': [2] * 20000, 'd': [3]*30000},这是一个令人讨厌的代价.

If kwargs is all small primitives, as in your example, this doesn't really matter; the incremental cost of sending along the extra arguments is trivial. But if kwargs is big, say, kwargs = {'b': [1] * 10000, 'c': [2] * 20000, 'd': [3]*30000}, that's a nasty price to pay.

在这种情况下,您有一些选择:

In that case, you have some options:

  1. 在全局级别滚动您自己的函数,其工作方式类似于 partial,但 pickle 的工作方式不同:

  1. Roll your own function at the global level that works like partial, but pickles differently:

class func_a_only(a):
    return func(a, 1, 2, 3)

  • Pool 使用 initializer 参数,这样每个工作进程都会设置一次状态,而不是每个任务一次,这样您就可以确保数据可用,即使您正在基于 spawn 的环境(例如 Windows)中工作

  • Using the initializer argument to Pool so each worker process sets up state once, instead of once per task, allowing you to ensure data is available even if you're working in spawn based environment (e.g. Windows)

    使用Managers在所有进程之间共享一个数据副本

    Using Managers to share a single copy of data among all processes

    可能还有一些其他方法.重点是,partial 适用于不会产生大量 pickle 的参数,但如果绑定参数很大,它会杀死你.

    And probably a handful of other approaches. Point is, partial is fine for arguments that don't produce huge pickles, but it can kill you if the bound arguments are huge.

    注意:在这种特殊情况下,如果您使用的是 Python 3.3+,您实际上并不需要 partial,并避免使用 dict 支持 tuples 节省了微不足道的开销.无需添加任何新功能,只需导入一些内容,即可替换:

    Note: In this particular case, if you're in Python 3.3+, you don't actually need partial, and avoiding the dict in favor of tuples saves a trivial amount of overhead. Without adding any new functions, just some imports, you could replace:

    kwargs = {'b': 1, 'c': 2, 'd': 3}
    mapfunc = partial(func, **kwargs)
    p.map(mapfunc, a_iter)
    

    与:

    from itertools import repeat
    
    p.starmap(func, zip(a_iter, repeat(1), repeat(2), repeat(3)))
    

    达到类似的效果.需要明确的是,partial 这修复"没有任何问题(两种方法在酸洗大对象时都有相同的问题),这只是一种偶尔有用的替代方法.

    to achieve a similar effect. To be clear, there is nothing wrong with partial that this "fixes" (both approaches would have the same problem with pickling large objects), this is just an alternate approach that is occasionally useful.

    这篇关于使用带有关键字参数的 multiprocessing.Pool.map() 函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆