Python Gzip - 即时附加到文件 [英] Python Gzip - Appending to file on the fly

查看:25
本文介绍了Python Gzip - 即时附加到文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用 Python 即时附加到 gzip 压缩的文本文件?

Is it possible to append to a gzipped text file on the fly using Python ?

基本上我是这样做的:-

Basically I am doing this:-

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()

每隔 6 秒左右就会向文件附加一行(注意附加"),但生成的文件与标准的未压缩文件一样大(完成后大约 1MB).

A line is appended (note "appended") to the file every 6 seconds or so, but the resulting file is just as big as a standard uncompressed file (roughly 1MB when done).

显式指定压缩级别似乎也没有什么区别.

Explicitly specifying the compression level does not seem to make a difference either.

如果我之后对现有的未压缩文件进行 gzip,它的大小会下降到大约 80kb.

If I gzip an existing uncompressed file afterwards, it's size comes down to roughly 80kb.

我猜它不可能动态附加"到一个 gzip 文件并压缩它?

Im guessing its not possible to "append" to a gzip file on the fly and have it compress ?

这是写入 String.IO 缓冲区然后在完成后刷新到 gzip 文件的情况吗?

Is this a case of writing to a String.IO buffer and then flushing to a gzip file when done ?

推荐答案

这在创建和维护有效 gzip 文件的意义上起作用,因为 gzip 格式允许连接 gzip 流.

That works in the sense of creating and maintaining a valid gzip file, since the gzip format permits concatenated gzip streams.

然而,它在压缩糟糕的意义上不起作用,因为你给每个 gzip 压缩实例提供了很少的数据来处理.压缩依赖于利用以前数据的历史,但这里 gzip 基本上没有.

However it doesn't work in the sense that you get lousy compression, since you are giving each instance of gzip compression so little data to work with. Compression depends on taking advantage the history of previous data, but here gzip has been given essentially none.

在调用 gzip 将另一个 gzip 流添加到文件之前,您可以 a) 累积至少几 K 数据,许多行,或者 b) 执行一些更复杂的操作,附加到单个 gzip 流,每次都留下一个有效的 gzip 流并允许有效地压缩数据.

You could either a) accumulate at least a few K of data, many of your lines, before invoking gzip to add another gzip stream to the file, or b) do something much more sophisticated that appends to a single gzip stream, leaving a valid gzip stream each time and permitting efficient compression of the data.

你在 C 中找到了一个关于 b) 的例子,在 gzlog.hgzlog.c.我不相信 Python 具有直接在 Python 中实现 gzlog 所需的所有 zlib 接口,但您可以从 Python 连接到 C 代码.

You find an example of b) in C, in gzlog.h and gzlog.c. I do not believe that Python has all of the interfaces to zlib needed to implement gzlog directly in Python, but you could interface to the C code from Python.

这篇关于Python Gzip - 即时附加到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆