Python Popen.communicate()内存限制的替代方法? [英] Alternatives to Python Popen.communicate() memory limitations?
问题描述
我有以下Python代码段(运行v2.7),导致我在处理大型(几个GB)文件时抛出MemoryError
异常:
I have the following chunk of Python code (running v2.7) that results in MemoryError
exceptions being thrown when I work with large (several GB) files:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
sys.stderr.write(myStderr)
在阅读Popen.communicate()
的文档时,似乎进行一些缓冲:
In reading the documentation to Popen.communicate()
, there appears to be some buffering going on:
注意:读取的数据缓存在内存中,因此,如果数据大小很大或没有限制,请不要使用此方法.
Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
是否有办法禁用此缓冲,或强制在进程运行时定期清除缓存?
Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?
我应该在Python中使用哪种替代方法来运行将千兆字节的数据流传输到stdout
的命令?
What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout
?
我应该注意,我需要处理输出和错误流.
I should note that I need to handle output and error streams.
推荐答案
我想我找到了解决方法:
I think I found a solution:
myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
sys.stdout.write(ln)
for ln in myProcess.stderr:
sys.stderr.write(ln)
这似乎使我的内存使用量降低到足以完成任务.
This seems to get my memory usage down enough to get through the task.
更新
I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!
这篇关于Python Popen.communicate()内存限制的替代方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!