多处理和垃圾收集 [英] multiprocessing and garbage collection

查看:118
本文介绍了多处理和垃圾收集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在py2.6 +中, multiprocessing 模块提供了一个 Pool 类,所以可以这样做:

  class不稳定(object):
def do_stuff(self,...):
pool = multiprocessing。 Pool()
return pool.imap(...)

然而,随着标准Python实现在2.7.2,这种方法很快就会导致IOError:[Errno 24]太多打开的文件。很明显, pool 对象永远不会被垃圾回收,所以它的进程永远不会终止,并累积内部打开的任何描述符。我认为这是因为以下工作:

$ p $ 类不稳定(对象):
def do_stuff(self,... ):
pool = multiprocessing.Pool()
result = pool.map(...)
pool.terminate()
返回结果

我希望保持 imap ;垃圾收集器在这种情况下如何工作?如何修复代码?

解决方案

最后,我最终传递了引用并在 pool.imap 迭代器完成后手动终止:

  class Volatile(object):
def do_stuff(self,...):
pool = multiprocessing.Pool()
return pool,pool.imap(.. )

def call_stuff(self):
pool,results = self.do_stuff()
表示结果的结果:
#懒惰评估imap
pool.terminate()






如果有人将来会碰到这个解决方案:chunksize参数在 Pool.imap 中非常重要(与普通的 Pool .map ,这并不重要)。我手动设置它,以便每个进程接收 1 + len(输入)/ len(池)作业。将它保留为默认值 chunksize = 1 给了我相同的性能,就好像我根本不使用并行处理一样...... bad。



我想用订购的 imap 与订购的 map 并没有真正的好处,我只是个人喜欢迭代器更好。


In py2.6+, the multiprocessing module offers a Pool class, so one can do:

class Volatile(object):
    def do_stuff(self, ...):
        pool = multiprocessing.Pool()
        return pool.imap(...)

However, with the standard Python implementation at 2.7.2, this approach soon leads to "IOError: [Errno 24] Too many open files". Apparently the pool object never gets garbage collected, so its processes never terminate, accumulating whatever descriptors are opened internally. I think this because the following works:

class Volatile(object):
    def do_stuff(self, ...):
        pool = multiprocessing.Pool()
        result = pool.map(...)
        pool.terminate()
        return result

I would like to keep the "lazy" iterator approach of imap; how does the garbage collector work in that case? How to fix the code?

解决方案

In the end, I ended up passing the pool reference around and terminating it manually once the pool.imap iterator was finished:

class Volatile(object):
    def do_stuff(self, ...):
        pool = multiprocessing.Pool()
        return pool, pool.imap(...)

    def call_stuff(self):
        pool, results = self.do_stuff()
        for result in results:
            # lazy evaluation of the imap
        pool.terminate()


In case anyone stumbles upon this solution in the future: the chunksize parameter is very important in Pool.imap (as opposed to plain Pool.map, where it didn't matter). I manually set it so that each process receives 1 + len(input) / len(pool) jobs. Leaving it to the default chunksize=1 gave me the same performance as if I didn't use parallel processing at all... bad.

I guess there's no real benefit to using ordered imap vs. ordered map, I just personally like iterators better.

这篇关于多处理和垃圾收集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆