有效区分大型文件的算法 [英] Algorithm for efficient diffing of huge files

查看：77 发布时间：2020/6/3 20:27:14 algorithm diff rcs

本文介绍了有效区分大型文件的算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我必须存储两个文件A和B，它们都非常大（例如100GB）。但是B在很大程度上可能与A相似，因此我可以存储A和diff（A，B）。这个问题有两个有趣的方面：

I have to store two files A and B which are both very large (like 100GB). However B is likely to be similar in big parts to A so i could store A and diff(A, B). There are two interesting aspects to this problem:

文件太大，无法被我知道的任何差异库进行分析，因为它们位于-memory

我实际上不需要diff-diff通常具有插入，编辑和删除的功能，因为它是供人类阅读的。我得到的信息更少了：我只需要新字节范围和从任意偏移量的旧文件中复制字节。

在这些条件下，我目前对如何计算从A到B的增量感到困惑。有人知道这个算法吗？

I am currently at a loss at how to compute the delta from A to B under these conditions. Does anyone know of an algorithm for this?

同样，问题很简单：编写一个算法，该算法可以在给定文件A和B的情况下以尽可能少的字节存储文件A和B。

Again, the problem is simple: Write an algorithm that can store the files A and B with as few bytes as possible given the fact that both are quite similar.

其他信息：尽管大零件可能是相同的，但它们的偏移量可能会有所不同，并且顺序混乱。最后一个事实是为什么常规差异可能不会节省太多的费用。

Additional info: Although big parts might be identical they are likely to have different offsets and be out of order. The last fact is why a conventional diff might not save much.

有效区分大型文件的算法 [英] Algorithm for efficient diffing of huge files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

有效区分大型文件的算法 [英] Algorithm for efficient diffing of huge files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭