多处理和垃圾收集 [英] multiprocessing and garbage collection
问题描述
multiprocessing
模块提供了一个 Pool
类,所以可以这样做: class不稳定(object):
def do_stuff(self,...):
pool = multiprocessing。 Pool()
return pool.imap(...)
然而,随着标准Python实现在2.7.2,这种方法很快就会导致IOError:[Errno 24]太多打开的文件。很明显, pool
对象永远不会被垃圾回收,所以它的进程永远不会终止,并累积内部打开的任何描述符。我认为这是因为以下工作:
$ p $ 类不稳定(对象):
def do_stuff(self,... ):
pool = multiprocessing.Pool()
result = pool.map(...)
pool.terminate()
返回结果
我希望保持
最后,我最终传递了池
引用并在 pool.imap
迭代器完成后手动终止:
class Volatile(object):
def do_stuff(self,...):
pool = multiprocessing.Pool()
return pool,pool.imap(.. )
def call_stuff(self):
pool,results = self.do_stuff()
表示结果的结果:
#懒惰评估imap
pool.terminate()
如果有人将来会碰到这个解决方案:chunksize参数在 Pool.imap
中非常重要(与普通的 Pool .map
,这并不重要)。我手动设置它,以便每个进程接收 1 + len(输入)/ len(池)
作业。将它保留为默认值 chunksize = 1
给了我相同的性能,就好像我根本不使用并行处理一样...... bad。
我想用订购的 imap
与订购的 map
并没有真正的好处,我只是个人喜欢迭代器更好。
In py2.6+, the multiprocessing
module offers a Pool
class, so one can do:
class Volatile(object):
def do_stuff(self, ...):
pool = multiprocessing.Pool()
return pool.imap(...)
However, with the standard Python implementation at 2.7.2, this approach soon leads to "IOError: [Errno 24] Too many open files". Apparently the pool
object never gets garbage collected, so its processes never terminate, accumulating whatever descriptors are opened internally. I think this because the following works:
class Volatile(object):
def do_stuff(self, ...):
pool = multiprocessing.Pool()
result = pool.map(...)
pool.terminate()
return result
I would like to keep the "lazy" iterator approach of imap
; how does the garbage collector work in that case? How to fix the code?
In the end, I ended up passing the pool
reference around and terminating it manually once the pool.imap
iterator was finished:
class Volatile(object):
def do_stuff(self, ...):
pool = multiprocessing.Pool()
return pool, pool.imap(...)
def call_stuff(self):
pool, results = self.do_stuff()
for result in results:
# lazy evaluation of the imap
pool.terminate()
In case anyone stumbles upon this solution in the future: the chunksize parameter is very important in Pool.imap
(as opposed to plain Pool.map
, where it didn't matter). I manually set it so that each process receives 1 + len(input) / len(pool)
jobs. Leaving it to the default chunksize=1
gave me the same performance as if I didn't use parallel processing at all... bad.
I guess there's no real benefit to using ordered imap
vs. ordered map
, I just personally like iterators better.
这篇关于多处理和垃圾收集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!