增量压缩和单次压缩之间有什么区别? [英] What is the difference between incremental and one-shot compression?

查看:445
本文介绍了增量压缩和单次压缩之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在python中使用 bz2 和/或 lzma 软件包。我正在尝试以csv格式压缩数据库转储,然后将其放入 zip 文件。我将其与两个软件包一起进行一次压缩。

I am trying to use the bz2 and/or lzma packages in python. I am trying to compress a database dump in csv format and then put it to a zip file. I got it to work with one-shot compression with both the packages.

代码如下:

with ZipFile('something.zip', 'w') as zf:
    content = bz2.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
    zf.writestr(
        'something.csv' + '.bz2',
        content,
        compress_type=ZIP_DEFLATED
    )

当我尝试使用增量压缩时,它会创建一个.zip文件,当我尝试提取该文件时,它会递归地提供一些存档文件。

When I try to use incremental compression then it creates a .zip file which when I try to extract keeps giving some archive file recursively.

代码

with ZipFile('something.zip', 'w') as zf:
    compressor = bz2.BZ2Compressor()
    content = compressor.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
    zf.writestr(
        'something.csv' + '.bz2',
        content,
        compress_type=ZIP_DEFLATED
    )
    compressor.flush()

我仔细阅读了文档并寻找有关压缩技术的信息,似乎没有关于一次压缩和增量压缩的全面信息。

I went through the documentation and also look for information about the compression techniques, and there seems to be no comprehensive information about what one-shot and incremental compression are.

推荐答案

单次模式和增量模式之间的区别在于,在单次模式下,您需要将所有数据存储在内存中;如果要压缩100 GB的文件,则应该有大量的RAM。

The difference between one-shot and incremental is that with one-shot mode you need to have the entire data in memory; if you are compressing a 100 gigabyte file, you ought to have loads of RAM.

使用增量编码器,您的代码可以一次向压缩器提供1 MB或1 KB的数据,并将任何数据结果写入文件中。另一个好处是,您可以使用增量压缩器来流式传输数据-您可以在所有未压缩数据可用之前就开始写入压缩数据!

With the incremental encoder your code can feed the compressor 1 megabyte or 1 kilobyte at a time and write whatever data results, into a file as soon as it is available. Another benefit is that an incremental compressor you can use to stream data - you can start writing compressed data before all uncompressed data is available!

您的第二个代码不正确,这将导致您丢失数据。 flush 可能会返回更多需要保存的数据。在这里,我使用Python 3压缩了1000个‘a’个字符的字符串; compress 的结果是一个空字符串;实际的压缩数据是从 flush 返回的。

Your second code is incorrect and it will cause you to lose your data. The flush may return more data that you need to save as well. Here I am compressing a string of 1000 'a' characters in Python 3; the result from compress is an empty string; the actual compressed data is returned from flush.

>>> c = bz2.BZ2Compressor()
>>> c.compress(b'a' * 1000)
b''
>>> c.flush()
b'BZh91AY&SYI\xdcOc\x00\x00\x01\x81\x01\xa0\x00\x00\x80\x00\x08 \x00 
\xaamA\x98\xba\x83\xc5\xdc\x91N\x14$\x12w\x13\xd8\xc0'

因此,您的第二个代码应为:

Thus your second code should be:

compressor = bz2.BZ2Compressor()
content = compressor.compress(bytes(csv_string, 'UTF-8'))  # also with lzma
content += compressor.flush()    

但是实际上,您仍然以非常复杂的方式进行一次压缩。

But actually you're still doing the one-shot compression, in a very complicated manner.

这篇关于增量压缩和单次压缩之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆