二进制补丁一代在C# [英] Binary patch-generation in C#

查看:130
本文介绍了二进制补丁一代在C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有,还是知道的,在C#中的二进制补丁生成算法的实现?

Does anyone have, or know of, a binary patch generation algorithm implementation in C#?

基本上,比较两个文件(指定的的和新的的),并产生一个可以用来升级的的文件具有相同内容的的新的的文件中的补丁文件。

Basically, compare two files (designated old and new), and produce a patch file that can be used to upgrade the old file to have the same contents as the new file.

的实施必须是比较快的,并用巨大的文件的工作。它应该表现出O(N)或O(LOGN)运行时。

The implementation would have to be relatively fast, and work with huge files. It should exhibit O(n) or O(logn) runtimes.

我自己的算法往往要么是糟糕的(快,但产生巨大的补丁)或慢(生产小补丁但为O(n ^ 2)运行时)。

My own algorithms tend to either be lousy (fast but produce huge patches) or slow (produce small patches but have O(n^2) runtime).

任何意见或实施指针将是很好的。

Any advice, or pointers for implementation would be nice.

具体来说,执行将被用于保证服务器同步的,我们有一个主服务器的各种大型数据文件。当主服务器的数据文件改变,我们需要更新一些异地的服务器也是如此。

Specifically, the implementation will be used to keep servers in sync for various large datafiles that we have one master server for. When the master server datafiles change, we need to update several off-site servers as well.

最天真的算法我已经作出,这仅适用于文件,可以保存在内存中,如下:

The most naive algorithm I have made, which only works for files that can be kept in memory, is as follows:


  1. 从的的文件,调用抢前四个字节这个

  2. 添加这些字节到词典,其中的键 - >位置的,其中的位置的是在那里我抓住这4个字节,0位置开始以

  3. 跳过第一这四个字节,去抢另一个的4(3重叠,1一),添加到字典中以同样的方式

  4. 重复步骤1-3在的文件

  5. 所有4个字节的块从开始的的文件,抢4个字节,并尝试看看它在字典中

  6. 如果找到,找到最长的匹配,如果有好几种,从比较字节两个文件

  7. 编码为在的文件位置的引用,并跳过匹配块在新的的文件
  8. 如果没有找到,编码从的新的的文件1个字节,跳过

  9. 重复的其余步骤5-8在新的的文件

  1. Grab the first four bytes from the old file, call this the key
  2. Add those bytes to a dictionary, where key -> position, where position is the position where I grabbed those 4 bytes, 0 to begin with
  3. Skip the first of these four bytes, grab another 4 (3 overlap, 1 one), and add to the dictionary the same way
  4. Repeat steps 1-3 for all 4-byte blocks in the old file
  5. From the start of the new file, grab 4 bytes, and attempt to look it up in the dictionary
  6. If found, find the longest match if there are several, by comparing bytes from the two files
  7. Encode a reference to that location in the old file, and skip the matched block in the new file
  8. If not found, encode 1 byte from the new file, and skip it
  9. Repeat steps 5-8 for the rest of the new file

这是有点像压缩,无窗,所以它会使用大量的内存。它是,但是,相当快,并产生相当小补丁,只要我努力使代码输出最小的。

This is somewhat like compression, without windowing, so it will use a lot of memory. It is, however, fairly fast, and produces quite small patches, as long as I try to make the codes output minimal.

一个更内存高效的算法使用窗口,但产生更大的补丁文件。

A more memory-efficient algorithm uses windowing, but produces much bigger patch files.

有更多的细微差别上述算法,我在这个岗位跳过,但如果需要,我可以张贴更多的细节。我这样做,不过,总觉得我需要一个不同的算法完全,所以提高上述算法可能不会让我远远不够。

There are more nuances to the above algorithm that I skipped in this post, but I can post more details if necessary. I do, however, feel that I need a different algorithm altogether, so improving on the above algorithm is probably not going to get me far enough.

修改#1 :下面是上述算法的更详细的描述

Edit #1: Here is a more detailed description of the above algorithm.

首先,结合两个文件,让您有一个大文件。记住两个文件之间的分割点。

First, combine the two files, so that you have one big file. Remember the cut-point between the two files.

其次,这样做的抢4个字节,并添加自己的位置到字典中的对一切步骤整个文件。

Secondly, do that grab 4 bytes and add their position to the dictionary step for everything in the whole file.

第三,从那里的新的的文件开始,做循环与尝试定位的4个字节的现有组合,和找到最长的匹配。确保我们只考虑从旧文件位置,或者从的早前在新文件中,比我们目前是的。这确保了我们能够在这两个老的,并在补丁应用的新的文件重复使用的材料。

Thirdly, from where the new file starts, do the loop with attempting to locate an existing combination of 4 bytes, and find the longest match. Make sure we only consider positions from the old file, or from earlier in the new file than we're currently at. This ensures that we can reuse material in both the old and the new file during patch application.

修改# 2 的源代码,以上面的算法

您可能会得到大约有一些问题证书警告。我不知道如何解决这样暂时只接受证书。

You might get a warning about the certificate having some problems. I don't know how to resolve that so for the time being just accept the certificate.

源使用许多其他类型的从我的图书馆休息,这样的文件是不是所有的需要,但是这是算法的实现。

The source uses lots of other types from the rest of my library so that file isn't all it takes, but that's the algorithm implementation.

@lomaxx,我试图找到一个良好的文档在颠覆中使用的算法,称为xdelta,但除非你已经知道如何算法的工作,我发现文件不告诉我,我需要知道的。

@lomaxx, I have tried to find a good documentation for the algorithm used in subversion, called xdelta, but unless you already know how the algorithm works, the documents I've found fail to tell me what I need to know.

也许我只是...密集:)

Or perhaps I'm just dense... :)

我参加了一个快速浏览一下从你给该网站的算法,这是不幸的是没有使用。从二进制补丁文件注释说:

I took a quick peek on the algorithm from that site you gave, and it is unfortunately not usable. A comment from the binary diff file says:

查找差异一组最佳要求相对于输入大小二次的时间,所以它变得无法使用非常快。

Finding an optimal set of differences requires quadratic time relative to the input size, so it becomes unusable very quickly.

我的需求是不是最优的,所以我在寻找一个更实际的解决方案。

My needs aren't optimal though, so I'm looking for a more practical solution.

感谢,虽然答案,添加书签到他的事业,如果我以往任何时候都需要他们。

Thanks for the answer though, added a bookmark to his utilities if I ever need them.

修改#1 :请注意,我会看他的代码,看看能不能找到一些想法,我也将在后面给他发电子邮件的问题,但我读过他的书,并引用虽然解决方案有利于。找到最优的解决方案,它在使用不切实际由于时间要求

Edit #1: Note, I will look at his code to see if I can find some ideas, and I'll also send him an email later with questions, but I've read that book he references and though the solution is good for finding optimal solutions, it is impractical in use due to the time requirements.

编辑#2 :我一定会追捕蟒蛇xdelta实施

Edit #2: I'll definitely hunt down the python xdelta implementation.

推荐答案

抱歉,我不能更多的帮助。我肯定会一直看着xdelta因为我已经用它多次产生对我们分发我们的产品产生600MB + ISO文件的质量比较和它有很好的表现。

Sorry I couldn't be more help. I would definately keep looking at xdelta because I have used it a number of times to produce quality diffs on 600MB+ ISO files we have generated for distributing our products and it performs very well.

这篇关于二进制补丁一代在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆