排序文件以优化压缩效率 [英] Sorting a file to optimize for compression efficiency

查看：131 发布时间：2020/6/3 21:21:19 algorithm sorting compression

本文介绍了排序文件以优化压缩效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一些大型数据文件正在串联，压缩，然后发送到另一台服务器。压缩减少了到目标服务器的传输时间，因此我们可以在短时间内获得的文件越小越好。这是一个高度时间敏感的过程。

We have some large data files that are being concatenated, compressed, and then sent to another server. The compression reduces the transmission time to the destination server, so the smaller we can get the file in a short period of time, the better. This is a highly time-sensitive process.

数据文件包含许多由制表符分隔的文本行，并且行的顺序无关紧要。

The data files contain many rows of tab-delimited text, and the order of the rows does not matter.

我们注意到，当我们按第一个字段对文件进行排序时，压缩文件的大小要小得多，大概是因为该列的重复项是彼此相邻的。但是，对大文件进行排序的速度很慢，没有真正的理由需要对其进行排序，而恰恰是它可以提高压缩率。第一列中的内容与后续列中的内容之间也没有任何关系。可能会有一些行的压缩顺序更小，或者有一种算法可以类似地提高压缩性能，但需要更少的运行时间。

We noticed that when we sorted the file by the first field, the compressed file size was much smaller, presumably because duplicates of that column are next to each other. However, sorting a large file is slow, and there's no real reason that it needs to be in sorted other than that it happens to improves compression. There's also no relationship between what's in the first column and what's in subsequent columns. There could be some ordering of rows that compressed even smaller, or alternatively there could be an algorithm that could similarly improve compression performance but require less time to run.

什么方法可以我用来对行进行重新排序以优化相邻行之间的相似性并提高压缩性能吗？

What approach could I use to reorder rows to optimize the similarity between neighboring rows and improve compression performance?

排序文件以优化压缩效率 [英] Sorting a file to optimize for compression efficiency

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

排序文件以优化压缩效率 [英] Sorting a file to optimize for compression efficiency

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭