Python等效的管道文件输出到Perl中的gzip使用管道 [英] Python equivalent of piping file output to gzip in Perl using a pipe

查看:436
本文介绍了Python等效的管道文件输出到Perl中的gzip使用管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找出如何将文件输出写入Python中的压缩文件,类似于以下两行:

I need to figure out how to write file output to a compressed file in Python, similar to the two-liner below:

open ZIPPED, "| gzip -c > zipped.gz";
print ZIPPED "Hello world\n";

在Perl中,这使用Unix gzip压缩打印到ZIPPED文件句柄的任何文件到压缩文件.gz。

In Perl, this uses Unix gzip to compress whatever you print to the ZIPPED filehandle to the file "zipped.gz".

我知道如何使用import gzip在Python中这样做:

I know how to use "import gzip" to do this in Python like this:

import gzip
zipped = gzip.open("zipped.gz", 'wb')
zipped.write("Hello world\n")

但是,这是非常慢。根据分析器,使用该方法占用了90%的运行时间,因为我将200GB的未压缩数据写入各种输出文件。我知道文件系统可能是这里的问题的一部分,但我想通过使用Unix / Linux压缩来排除它。这部分是因为我听说使用同一个模块进行解压缩也很慢。

However, that is extremely slow. According to the profiler, using that method takes up 90% of my run time since I am writing 200GB of uncompressed data to various output files. I am aware that the file system could be part of the problem here, but I want to rule it out by using Unix/Linux compression instead. This is partially because I have heard that decompressing using this same module is slow as well.

推荐答案

ChristopheD建议使用子流程模块是此问题的适当答案。但是,我不清楚它会解决你的性能问题。

ChristopheD's suggestion of using the subprocess module is an appropriate answer to this question. However, it's not clear to me that it will solve your performance problems. You would have to measure the performance of the new code to be sure.

要转换示例代码,请执行以下操作:

To convert your sample code:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.communicate("Hello World\n")

由于您需要发送大量数据对于子流程,您应该考虑使用 stdin 属性的Popen对象。例如:

Since you need to send large amounts of data to the sub-process, you should consider using the stdin attribute of the Popen object. For example:

import subprocess

p = subprocess.Popen("gzip -c > zipped.gz", shell=True, stdin=subprocess.PIPE)
p.stdin.write("Some data")

# Write more data here...

p.communicate() # Finish writing data and wait for subprocess to finish

此问题的讨论 。

这篇关于Python等效的管道文件输出到Perl中的gzip使用管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆