Python的多处理和内存 [英] Python's multiprocessing and memory

查看:61
本文介绍了Python的多处理和内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用multiprocessing.imap_unordered对值列表进行计算:

I am using multiprocessing.imap_unordered to perform a computation on a list of values:

def process_parallel(fnc, some_list):
    pool = multiprocessing.Pool()
    for result in pool.imap_unordered(fnc, some_list):
        for x in result:
            yield x
    pool.terminate()

根据设计,每次调用fnc都会返回一个HUGE对象.我可以将N个此类对象的实例存储在RAM中,其中N〜cpu_count个,但不能更多(不是数百个).

Each call to fnc returns a HUGE object as a result, by design. I can store N instances of such object in RAM, where N ~ cpu_count, but not much more (not hundreds).

现在,使用此功能占用太多内存.内存完全用在主要流程上,而不用在工作人员上.

Now, using this function takes up too much memory. The memory is entirely spent in the main process, not in the workers.

imap_unordered如何存储完成的结果?我的意思是工作人员已经返回但尚未传递给用户的结果.我认为这很聪明,只根据需要懒惰地"计算它们,但显然不是.

How does imap_unordered store the finished results? I mean the results that were already returned by workers but not yet passed on to user. I thought it was smart and only computed them "lazily" as needed, but apparently not.

由于我不能足够快地消耗process_parallel的结果,因此池一直在内部从fnc的某个地方对这些巨大的对象进行排队,然后崩溃.有办法避免这种情况吗?以某种方式限制其内部队列?

It looks like since I cannot consume the results of process_parallel fast enough, the pool keeps queueing these huge objects from fnc somewhere, internally, and then blows up. Is there a way to avoid this? Limit its internal queue somehow?

我正在使用Python2.7.干杯.

I'm using Python2.7. Cheers.

推荐答案

通过查看相应的源文件(python2.7/multiprocessing/pool.py)可以看到,IMapUnorderedIterator使用collections.deque实例存储结果.如果有新项目出现,则会在迭代中添加和删除该项目.

As you can see by looking into the corresponding source file (python2.7/multiprocessing/pool.py), the IMapUnorderedIterator uses a collections.deque instance for storing the results. If a new item comes in, it is added and removed in the iteration.

如您所建议,如果在主线程仍在处理该对象的同时又有另一个大对象进入,则这些对象也将存储在内存中.

As you suggested, if another huge object comes in while the main thread is still processing the object, those will be stored in memory too.

您可能会尝试以下操作:

What you might try is something like this:

it = pool.imap_unordered(fnc, some_list)
for result in it:
    it._cond.acquire()
    for x in result:
        yield x
    it._cond.release()

如果正在尝试将下一个对象放入双端队列,则在处理项目时这将导致task-result-receiver线程被阻塞. 因此,内存中不应有两个以上的大对象. 如果这样适合您的情况,我不知道;)

This should cause the task-result-receiver-thread to get blocked while you process an item if it is trying to put the next object into the deque. Thus there should not be more than two of the huge objects in memory. If that works for your case, I don't know ;)

这篇关于Python的多处理和内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆