通过多处理对发生器输出进行减少 [英] Apply reduce on generator output with multiprocessing

查看:53
本文介绍了通过多处理对发生器输出进行减少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的生成器函数(Python)

I have a generator function (Python) that works like this

def Mygenerator(x, y, z, ...):
    while True:
        # code that makes two matrices based on sequences of input arrays
        yield (matrix1, matrix2)

我要做的是添加此生成器的输出.这行可以完成工作:

What I want to do is to add the output from this generator. This line does the job:

M1, M2 = reduce(lambda x, y: x[0] + y[0], x[1] + y[1], Mygenerator(x, y, z, ...))

我想对此并行化以加快计算速度.重要的是,应减少Mygenerator的输出,因为list(Mygenerator(...))会占用过多的内存.

I would like to parallelize this to speed up the computations. It is important that the outputs from Mygenerator is reduced as it is yielded, since list(Mygenerator(...)) would take too much memory.

推荐答案

要回答我自己的问题,我发现了一个似乎可以实现我所希望的解决方案:

To answer my own question, I found a solution that seems to work as I had hoped:

首先,Mygenerator不再是生成器,而是函数.另外,我现在不遍历x,y和z段,而是将一个段传递给函数:

First, Mygenerator is no longer a generator but a function. Also, instead of looping through segments of x, y and z, I now pass one segment to the function at the time:

def Myfunction(x_segment, y_segment, z_segment):
        # code that makes two matrices based on input arrays
        return (matrix1, matrix2)

multiprocessing.Poolimap(发电机)功能一起使用似乎有效:

Using multiprocessing.Pool with the imap (generator) function seems to work:

pool = multiprocessing.Pool(ncpus)
results = pool.imap(Myfunction, 
                    ( (x[i], y[i], z[i]) for i in range(len(x)) )
M1, M2 = reduce(lambda r1, r2: (r1[0] + r2[0], r1[1] + r2[1]), 
                    (result for result in results))
pool.close()
pool.join()

中,我将lambda表达式中的xy更改为r1r2,以避免与其他同名变量混淆.当尝试使用multiprocessing生成器时,我在使用pickle时遇到了麻烦.

where I changed the x and y in the lambda expression to r1 and r2 to avoid confusion with the other variables with the same name. When trying to use a generator with multiprocessing I got some trouble with pickle.

此解决方案唯一令人失望的是,它并没有真正加快计算速度.我想这与开销操作有关.当使用8核时,处理速度提高了大约10%.当减少到4核时,速度提高了一倍.除非有其他方法可以并行处理,否则这似乎是我可以完成的最好的工作...

The only disappointment with this solution is that it didn't really speed up the computations that much. I guess that has to do with overhead operations. When using 8 cores, the processing speed was increased by approximately 10%. When reducing to 4 cores the speed was doubled. This seems to be the best I can do with my particular task, unless there is some other way of doing the parallelizing...

在这里必须使用imap函数,因为map会在reduce操作之前将所有返回的值存储在内存中,在这种情况下是不可能的.

The imap function was necessary to use here, since map would store all the returned values in memory before the reduce operation, and in this case that would not be possible.

这篇关于通过多处理对发生器输出进行减少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆