如何在python中并行化大for循环 [英] how to parallelize big for loops in python

查看:38
本文介绍了如何在python中并行化大for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚接触 Python,我仍处于学习曲线的陡峭阶段.感谢您提前提出任何意见.

我有一个很大的 for 循环要运行(在多次迭代的意义上很大),例如:

for i in range(10000)对于范围内的 j(10000)f((i,j))

我认为如何并行化它是一个常见的问题,在谷歌上搜索数小时后,我使用多处理"模块找到了解决方案,如下所示:

pool=Pool()x=pool.map(f,[(i,j) for i in range(10000) for j in range(10000)])

这在循环很小时有效.但是,如果循环很大,它确实很慢,或者如果循环太大,有时会出现内存错误.似乎python会首先生成参数列表,然后将列表提供给函数f",甚至使用xrange.那是对的吗?

所以这种并行化对我不起作用,因为我真的不需要将所有参数存储在列表中.有一个更好的方法吗?我感谢任何建议或参考.谢谢.

解决方案

似乎python会首先生成参数列表,然后将列表提供给函数f",甚至使用xrange.对吗?

是的,因为您使用的是列表推导式,它明确要求它生成该列表.

(请注意,xrange 在这里并不真正相关,因为您一次只有两个范围,每个 10K 长;与参数列表的 100M 相比,这算不了什么.)

如果您希望它根据需要即时生成值,而不是一次生成所有 100M,您需要使用生成器表达式而不是列表推导式.这几乎总是将括号变成括号的问题:

x=pool.map(f,((i,j) for i in range(10000) for j in range(10000)))

<小时>

但是,正如您从来源中所见, map 如果你给它一个生成器,最终只会制作一个列表,所以在这种情况下,这不会解决任何问题.(文档没有明确说明这一点,但是如果它没有长度,很难看出它如何选择一个好的块大小来将可迭代对象切成小块……).

而且,即使那不是真的,您仍然会再次遇到相同的结果问题,因为 pool.map 返回一个列表.

要解决这两个问题,您可以使用pool.imap 代替.它懒惰地消耗可迭代对象,并返回一个懒惰的结果迭代器.

需要注意的一点是,如果您不通过,imap 不会猜测最佳块大小,而只是默认为 1,因此您可能需要一个一些想法或尝试和错误来优化它.

此外,imap 仍然会在一些结果进来时排队,因此它可以按照与参数相同的顺序将它们反馈给您.在病理情况下,它最终可能会排队 (poolsize-1)/poolsize 你的结果,尽管在实践中这是非常罕见的.如果您想解决这个问题,请使用 imap_unordered.如果您需要知道排序,只需将索引与参数和结果来回传递:

args = ((i, j) for i in range(10000) for j in range(10000))def indexed_f(index, (i, j)):返回索引,f(i, j)结果 = pool.imap_unordered(indexed_f, enumerate(args))

<小时>

但是,我注意到在您的原始代码中,您对 f(i, j) 的结果根本没有做任何事情.在这种情况下,为什么还要费心收集结果呢?在这种情况下,你可以回到循环:

for i in range(10000):对于范围内的 j(10000):map.apply_async(f, (i,j))

然而,imap_unordered 可能仍然值得使用,因为它提供了一种非常简单的方法来阻塞直到所有任务完成,同时仍然让池本身运行以备后用:

def 消费(迭代器):双端队列(迭代器,max_len=0)x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))消耗(x)

I just got to Python, and I am still in the steep phase of the learning curve. Thank you for any comments ahead.

I have a big for loop to run (big in the sense of many iterations), for example:

for i in range(10000)
    for j in range(10000)
        f((i,j))

I though that it would be a common question of how to parallelize it, and after hours of search on google I arrived at the solution using "multiprocessing" module, as the following:

pool=Pool()
x=pool.map(f,[(i,j) for i in range(10000) for j in range(10000)])

This works when the loop is small. However, it is really slow if the loop is large, Or sometimes a memory error occurs if the loops are too big. It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?

So this parallelization does not work for me because I do not really need to store all arguments in a list. Is there a better way to do this? I appreciate any suggestions or references. Thank you.

解决方案

It seems that python would generate the list of arguments first, and then feed the list to the function "f", even using xrange. Is that correct?

Yes, because you're using a list comprehension, which explicitly asks it to generate that list.

(Note that xrange isn't really relevant here, because you only have two ranges at a time, each 10K long; compared to the 100M of the argument list, that's nothing.)

If you want it to generate the values on the fly as needed, instead of all 100M at once, you want to use a generator expression instead of a list comprehension. Which is almost always just a matter of turning the brackets into parentheses:

x=pool.map(f,((i,j) for i in range(10000) for j in range(10000)))


However, as you can see from the source, map will ultimately just make a list if you give it a generator, so in this case, that won't solve anything. (The docs don't explicitly say this, but it's hard to see how it could pick a good chunksize to chop the iterable into if it didn't have a length…).

And, even if that weren't true, you'd still just run into the same problem again with the results, because pool.map returns a list.

To solve both problems, you can use pool.imap instead. It consumes the iterable lazily, and returns a lazy iterator of results.

One thing to note is that imap does not guess at the best chunksize if you don't pass one, but just defaults to 1, so you may need a bit of thought or trial&error to optimize it.

Also, imap will still queue up some results as they come in, so it can feed them back to you in the same order as the arguments. In pathological cases, it could end up queuing up (poolsize-1)/poolsize of your results, although in practice this is incredibly rare. If you want to solve this, use imap_unordered. If you need to know the ordering, just pass the indexes back and forth with the args and results:

args = ((i, j) for i in range(10000) for j in range(10000))
def indexed_f(index, (i, j)):
    return index, f(i, j)
results = pool.imap_unordered(indexed_f, enumerate(args))


However, I notice that in your original code, you're not doing anything at all with the results of f(i, j). In that case, why even bother gathering up the results at all? In that case, you can just go back to the loop:

for i in range(10000):
    for j in range(10000):
        map.apply_async(f, (i,j))

However, imap_unordered may still be worth using, because it provides a very easy way to block until all of the tasks are done, while still leaving the pool itself running for later use:

def consume(iterator):
    deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)

这篇关于如何在python中并行化大for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆