Python:创建类似gzip的流式文件吗? [英] Python: Creating a streaming gzip'd file-like?

查看:99
本文介绍了Python:创建类似gzip的流式文件吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出使用Python的zlib压缩流的最佳方法.

我有一个类似文件的输入流(下面的input)和一个可以接受类似文件的输出函数(下面的output_function):

with open("file") as input:
    output_function(input)

我想先gzip-compress input块,然后再将它们发送到output_function:

with open("file") as input:
    output_function(gzip_stream(input))

看起来 gzip 模块假定输入或输出将是gzip的磁盘文件.因此,我假设我想要的是zlib 模块.

但是,它本身并没有提供创建类似流文件的简单方法.它确实支持的流压缩是通过手动将数据添加到压缩缓冲区,然后刷新该缓冲区来实现的.

当然,我可以在zlib.Compress.compresszlib.Compress.flush周围编写包装器(Compresszlib.compressobj()返回),但是我担心担心缓冲区大小错误或类似情况.

那么,用Python创建流式,gzip压缩文件的最简单方法是什么?

编辑:为明确起见,输入流和压缩输出流都太大而无法容纳在内存中,因此类似output_function(StringIO(zlib.compress(input.read())))的东西并不能真正解决问题.

解决方案

这很笨拙(自引用等;只需花几分钟时间写,没有什么真正的优雅),但是如果您仍然愿意的话,它就可以满足您的要求对直接使用gzip而不是直接使用zlib感兴趣.

基本上,GzipWrap是一个(非常有限的)类似文件的对象,它从给定的可迭代对象(例如,类似文件的对象,字符串列表,任何生成器...)中生成压缩文件. >

当然,它会生成二进制文件,因此实现"readline"毫无意义.

您应该能够将其扩展为涵盖其他情况,或者用作可迭代对象本身.

from gzip import GzipFile

class GzipWrap(object):
    # input is a filelike object that feeds the input
    def __init__(self, input, filename = None):
        self.input = input
        self.buffer = ''
        self.zipper = GzipFile(filename, mode = 'wb', fileobj = self)

    def read(self, size=-1):
        if (size < 0) or len(self.buffer) < size:
            for s in self.input:
                self.zipper.write(s)
                if size > 0 and len(self.buffer) >= size:
                    self.zipper.flush()
                    break
            else:
                self.zipper.close()
            if size < 0:
                ret = self.buffer
                self.buffer = ''
        else:
            ret, self.buffer = self.buffer[:size], self.buffer[size:]
        return ret

    def flush(self):
        pass

    def write(self, data):
        self.buffer += data

    def close(self):
        self.input.close()

I'm trying to figure out the best way to compress a stream with Python's zlib.

I've got a file-like input stream (input, below) and an output function which accepts a file-like (output_function, below):

with open("file") as input:
    output_function(input)

And I'd like to gzip-compress input chunks before sending them to output_function:

with open("file") as input:
    output_function(gzip_stream(input))

It looks like the gzip module assumes that either the input or the output will be a gzip'd file-on-disk… So I assume that the zlib module is what I want.

However, it doesn't natively offer a simple way to create a stream file-like… And the stream-compression it does support comes by way of manually adding data to a compression buffer, then flushing that buffer.

Of course, I could write a wrapper around zlib.Compress.compress and zlib.Compress.flush (Compress is returned by zlib.compressobj()), but I'd be worried about getting buffer sizes wrong, or something similar.

So, what's the simplest way to create a streaming, gzip-compressing file-like with Python?

Edit: To clarify, the input stream and the compressed output stream are both too large to fit in memory, so something like output_function(StringIO(zlib.compress(input.read()))) doesn't really solve the problem.

解决方案

It's quite kludgy (self referencing, etc; just put a few minutes writing it, nothing really elegant), but it does what you want if you're still interested in using gzip instead of zlib directly.

Basically, GzipWrap is a (very limited) file-like object that produces a gzipped file out of a given iterable (e.g., a file-like object, a list of strings, any generator...)

Of course, it produces binary so there was no sense in implementing "readline".

You should be able to expand it to cover other cases or to be used as an iterable object itself.

from gzip import GzipFile

class GzipWrap(object):
    # input is a filelike object that feeds the input
    def __init__(self, input, filename = None):
        self.input = input
        self.buffer = ''
        self.zipper = GzipFile(filename, mode = 'wb', fileobj = self)

    def read(self, size=-1):
        if (size < 0) or len(self.buffer) < size:
            for s in self.input:
                self.zipper.write(s)
                if size > 0 and len(self.buffer) >= size:
                    self.zipper.flush()
                    break
            else:
                self.zipper.close()
            if size < 0:
                ret = self.buffer
                self.buffer = ''
        else:
            ret, self.buffer = self.buffer[:size], self.buffer[size:]
        return ret

    def flush(self):
        pass

    def write(self, data):
        self.buffer += data

    def close(self):
        self.input.close()

这篇关于Python:创建类似gzip的流式文件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆