Paralel for loop,map()起作用,pool.map()给出TypeError [英] Paralel for loop, map() works, pool.map() gives TypeError

查看:70
本文介绍了Paralel for loop,map()起作用,pool.map()给出TypeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个压缩的(仅右上角)距离矩阵.距离的计算需要一些时间,因此我想并行化for循环.未平行的循环看起来像

I am making a condensed (only upper right) distance matrix. The calculation of the distance takes some time, so I want to paralelise the for loop. The unparelalised loop looks like

spectra_names, condensed_distance_matrix, index_0 = [], [], 0 
for index_1, index_2 in itertools.combinations(range(len(clusters)), 2):
    if index_0 == index_1:
        index_0 += 1
        spectra_names.append(clusters[index_1].get_names()[0])
    try:
        distance = 1/float(compare_clusters(clusters[index_1], clusters[index_2],maxiter=50))
    except:
        distance = 10
    condensed_distance_matrix.append(distance)

其中簇是要比较的对象的列表,compare_clusters()是似然函数,1/compare_clusters()是两个对象之间的距离.

where clusters is a list of objects to compare, compare_clusters() is a likelihood function and 1/compare_clusters() is the distance between two objects.

我试图通过将距离函数移出循环来使其平行化

I tried to paralelise it by moving the distance function out of the loop like so

from multiprocessing import Pool
condensed_distance_matrix = []
spectra_names = []
index_0 = 0
clusters_1 = []
clusters_2 = []
for index_1, index_2 in itertools.combinations(range(len(clusters)), 2):
    if index_0 == index_1:
        index_0 += 1
        spectra_names.append(clusters[index_1].get_names()[0])
    clusters_1.append(clusters[index_1])
    clusters_2.append(clusters[index_2])
pool = Pool()
condensed_distance_matrix_values = pool.map(compare_clusters, clusters_1, clusters_2)

for value in condensed_distance_matrix_values :
    try:
        distance = 1/float(value)
    except:
        distance = 10
    condensed_distance_matrix.append(distance)

在进行散列化之前,我尝试了相同的代码,但是使用了map()而不是pool.map().这如我所愿.但是,使用pool.map()时出现错误

Before paralelising I tried the same code, but with map() instead of pool.map(). This worked as I wanted. However, when using pool.map() I get the error

  File "C:\Python27\lib\multiprocessing\pool.py", line 225, in map
    return self.map_async(func, iterable, chunksize).get()
  File "C:\Python27\lib\multiprocessing\pool.py", line 288, in map_async
    result = MapResult(self._cache, chunksize, len(iterable), callback)
  File "C:\Python27\lib\multiprocessing\pool.py", line 551, in __init__
    self._number_left = length//chunksize + bool(length % chunksize)
TypeError: unsupported operand type(s) for //: 'int' and 'list'

我在这里想念什么?

推荐答案

来自 Pool.map的文档:

与map()内置函数的并行等效项(尽管它仅支持一个可迭代的参数).它会阻塞直到结果准备就绪.

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). It blocks until the result is ready.

对于普通的map,您可以提供多个可迭代项.例如,

For ordinary map, you can supply multiple iterables. For example,

>>> map(lambda x,y: x+y, "ABC", "DEF")
['AD', 'BE', 'CF']

但是您不能使用Pool.map执行此操作.第三个参数解释为chunksize.您需要给它一个列表,当它需要一个整数时.

But you can't do this with Pool.map. The third argument is interpreted as chunksize. You are giving it a list when it expects an int.

也许通过合并列表,您只能传递单个可迭代项:

Perhaps you could pass in only a single iterable, by combining your lists:

pool.map(lambda (a,b): compare_clusters(a,b), zip(clusters_1, clusters_2))

我尚未在pool.map上进行过测试,但是此策略适用于普通map.

I haven't tested it with pool.map, but this strategy works for ordinary map.

>>> map(lambda (a,b): a+b, zip("ABC", "DEF"))
['AD', 'BE', 'CF']

这篇关于Paralel for loop,map()起作用,pool.map()给出TypeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆