如何将具有多个参数的函数传递给 python concurrent.futures.ProcessPoolExecutor.map()? [英] How to pass a function with more than one argument to python concurrent.futures.ProcessPoolExecutor.map()?

查看:51
本文介绍了如何将具有多个参数的函数传递给 python concurrent.futures.ProcessPoolExecutor.map()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望 concurrent.futures.ProcessPoolExecutor.map() 调用由 2 个或更多参数组成的函数.在下面的示例中,我使用了 lambda 函数并将 ref 定义为一个与 numberlist 大小相同且值相同的数组.

I would like concurrent.futures.ProcessPoolExecutor.map() to call a function consisting of 2 or more arguments. In the example below, I have resorted to using a lambda function and defining ref as an array of equal size to numberlist with an identical value.

第一个问题:有没有更好的方法来做到这一点?在 numberlist 的大小可以是百万到十亿个元素的情况下,因此 ref size 必须遵循 numberlist,这种方法不必要地占用宝贵的内存,我想避免这种情况.我这样做是因为我读到 map 函数将终止它的映射,直到到达最短的数组末端.

1st Question: Is there a better way of doing this? In the case where the size of numberlist can be million to billion elements in size, hence ref size would have to follow numberlist, this approach unnecessarily takes up precious memory, which I would like to avoid. I did this because I read the map function will terminate its mapping until the shortest array end is reach.

import concurrent.futures as cf

nmax = 10
numberlist = range(nmax)
ref = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
workers = 3


def _findmatch(listnumber, ref):    
    print('def _findmatch(listnumber, ref):')
    x=''
    listnumber=str(listnumber)
    ref = str(ref)
    print('listnumber = {0} and ref = {1}'.format(listnumber, ref))
    if ref in listnumber:
        x = listnumber
    print('x = {0}'.format(x))
    return x 

a = map(lambda x, y: _findmatch(x, y), numberlist, ref)
for n in a:
    print(n)
    if str(ref[0]) in n:
        print('match')

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    #for n in executor.map(_findmatch, numberlist):
    for n in executor.map(lambda x, y: _findmatch(x, ref), numberlist, ref):
        print(type(n))
        print(n)
        if str(ref[0]) in n:
            print('match')

运行上面的代码,我发现map 函数能够达到我想要的结果.但是,当我将相同的条件转移到 concurrent.futures.ProcessPoolExecutor.map() 时,python3.5 失败并出现此错误:

Running the code above, I found that the map function was able to achieve my desired outcome. However, when I transferred the same terms to concurrent.futures.ProcessPoolExecutor.map(), python3.5 failed with this error:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/queues.py", line 241, in _feed
    obj = ForkingPickler.dumps(obj)
  File "/usr/lib/python3.5/multiprocessing/reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7fd2a14db0d0>: attribute lookup <lambda> on __main__ failed

问题 2:为什么会发生此错误,以及如何让 concurrent.futures.ProcessPoolExecutor.map() 调用具有 1 个以上参数的函数?

Question 2: Why did this error occur and how do I get concurrent.futures.ProcessPoolExecutor.map() to call a function with more than 1 argument?

推荐答案

首先回答你的第二个问题,你得到一个异常,因为像你正在使用的那样的 lambda 函数是不可选择的.由于 Python 使用 pickle 协议来序列化主进程和 ProcessPoolExecutor 的工作进程之间传递的数据,这是一个问题.根本不清楚您为什么要使用 lambda.您拥有的 lambda 有两个参数,就像原始函数一样.您可以直接使用 _findmatch 而不是 lambda 并且它应该可以工作.

To answer your second question first, you are getting an exception because a lambda function like the one you're using is not picklable. Since Python uses the pickle protocol to serialize the data passed between the main process and the ProcessPoolExecutor's worker processes, this is a problem. It's not clear why you are using a lambda at all. The lambda you had takes two arguments, just like the original function. You could use _findmatch directly instead of the lambda and it should work.

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    for n in executor.map(_findmatch, numberlist, ref):
        ...

关于在不创建巨大列表的情况下传递第二个常量参数的第一个问题,您可以通过多种方式解决此问题.一种方法可能是使用 itertools.repeat 创建一个可迭代对象,该对象在迭代时永远重复相同的值.

As for the first issue about passing the second, constant argument without creating a giant list, you could solve this in several ways. One approach might be to use itertools.repeat to create an iterable object that repeats the same value forever when iterated on.

但更好的方法可能是编写一个额外的函数来为您传递常量参数.(也许这就是您尝试使用 lambda 函数的原因?)如果您使用的函数可在模块的顶级命名空间中访问,它应该可以工作:

But a better approach would probably be to write an extra function that passes the constant argument for you. (Perhaps this is why you were trying to use a lambda function?) It should work if the function you use is accessible at the module's top-level namespace:

def _helper(x):
    return _findmatch(x, 5)

with cf.ProcessPoolExecutor(max_workers=workers) as executor:
    for n in executor.map(_helper, numberlist):
        ...

这篇关于如何将具有多个参数的函数传递给 python concurrent.futures.ProcessPoolExecutor.map()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆