多处理池的Gzip问题 [英] Gzip issue with multiprocessing pool
本文介绍了多处理池的Gzip问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个要从多处理池写入的gzip文件句柄.不幸的是,输出文件似乎在某一点之后已损坏,因此可以执行 zcat out |wc
给出:
I have a gzip file handle that I'm writing to from a multiprocessing pool. Unfortunately, the output file seems to become corrupted after a certain point, so doing something like zcat out | wc
gives:
gzip: out: invalid compressed data--format violated
我通过不使用gzip解决此问题.但是我很好奇为什么会发生这种情况以及是否有解决方案.
I'm dealing with the problem by not using gzip. But I'm curious as to why this is happening and if there is any solution.
不确定是否重要,但是我正在不受控制的远程Linux机器上运行代码,但我猜这是一台ubuntu机器.Python 2.7.3
Not sure if it matters, but I'm running the code on a remote linux machine that I don't control but my guess is that it's an ubuntu machine. Python 2.7.3
这是稍微简化的代码:
lock = Lock()
ohandle = gzip.open("out", "w")
def process(fn):
rv = []
for l in open(fn):
sometext = dosomething(l)
rv.append(sometext)
lock.acquire()
for sometext in rv:
print >> ohandle, sometext
lock.release()
pool = Pool(processes=4)
pm = pool.map(process, some_file_list])
ohandle.close()
推荐答案
查看全文