用于压缩文本数据并将其存储为文本的库 [英] Library to compress text data and store it as text

查看：98 发布时间：2020/10/7 2:36:30 zlib compression huffman-code

本文介绍了用于压缩文本数据并将其存储为文本的库的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将网页存储在压缩文本文件（CSV）中。为了实现最佳压缩，我想提供一组1000个网页。然后，图书馆应该花一些时间为该内容创建最佳的词典。一个明显的字典条目可能是<！DOCTYPE HTML PUBLIC-// W3C // DTD HTML 4.01 // EN http://www.w3.org/TR/html4/strict .dtd> ，因为它几乎存在于所有网页中，所以可以存储为％1或类似的名称。通过创建这样的自定义词典，在我的情况下，压缩率应为99％。

I want to store web pages in compressed text files (CSV). To achieve the optimal compression, I would like to provide a set of 1000 web pages. The library should then spend some time creating the optimal "dictionary" for this content. One obvious "dictionary" entry could be <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">, which could get stored as %1 or something like that because it is present on almost all web pages. By creating a customized dictionary like this, the compression rates should be 99% in my case.

我的问题是，在具有MIT的Windows上是否存在用于执行此操作的库？是否存在类似的自由许可？如果没有，则建议您使用任何通用压缩库。我用zlib尝试了一下，但是它输出二进制数据。如果将二进制数据转换为文本，我担心结果可能会比原始文本更长。

My question is, does a library for doing this exist on Windows with MIT or similar liberal licensing exist? If not, are there any general purpose compression libaries you would recommend. I have tried a bit with zlib, but it outputs binary data. If I would convert this binary data into text, I am worried that the result might be longer than the original text.

编辑：我需要能够存储文本

I need to be able to store the text in CSV files and still be able to import them into a database or even Excel.

用于压缩文本数据并将其存储为文本的库 [英] Library to compress text data and store it as text

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于压缩文本数据并将其存储为文本的库 [英] Library to compress text data and store it as text

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭