Python Popen.communicate() 内存限制的替代方案? [英] Alternatives to Python Popen.communicate() memory limitations?

查看:30
本文介绍了Python Popen.communicate() 内存限制的替代方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 Python 代码块(运行 v2.7),导致在处理大型(几 GB)文件时引发 MemoryError 异常:

I have the following chunk of Python code (running v2.7) that results in MemoryError exceptions being thrown when I work with large (several GB) files:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

阅读文档到Popen.communicate(),似乎有一些缓冲正在进行:

In reading the documentation to Popen.communicate(), there appears to be some buffering going on:

注意 读取的数据是缓存在内存中的,所以如果数据量很大或者没有限制就不要使用这种方法.

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

有没有办法禁用这种缓冲,或者在进程运行时强制定期清除缓存?

Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?

我应该在 Python 中使用什么替代方法来运行将千兆字节数据流式传输到 stdout 的命令?

What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout?

我应该注意到我需要处理输出和错误流.

I should note that I need to handle output and error streams.

推荐答案

我想我找到了解决方案:

I think I found a solution:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

这似乎使我的内存使用量下降到足以完成任务.

This seems to get my memory usage down enough to get through the task.

更新

我最近发现了一种在 Python 中处理数据流的更灵活的方法,使用 线程.有趣的是,Python 在 shell 脚本可以轻松完成的事情上如此糟糕!

I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!

这篇关于Python Popen.communicate() 内存限制的替代方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆