差异算法 [英] Diff Algorithm

查看:545
本文介绍了差异算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找像疯了差异算法可行,高效的解释。

I've been looking like crazy for an explanation of a diff algorithm that works and is efficient.

我得到的最接近的是此链接RFC 3284 (从几个埃里克库的博客文章),它描述在完全可以理解的方面的数据格式的,其中差异的结果被存储。但是,任何没有提及,如何同时做一个diff程序将达到这些结果。

The closest I got is this link to RFC 3284 (from several Eric Sink blog posts), which describes in perfectly understandable terms the data format in which the diff results are stored. However, it has no mention whatsoever as to how a program would reach these results while doing a diff.

我想研究这个出于个人好奇,因为我敢肯定,实施差异的算法,这是pretty的时而清晰,当你在比较和不知道的时候,必须有取舍为什么差异项目选择这个作为一个变化,而不是那个?......

I'm trying to research this out of personal curiosity, because I'm sure there must be tradeoffs when implementing a diff algorithm, which are pretty clear sometimes when you look at diffs and wonder "why did the diff program chose this as a change instead of that?"...

有谁知道在哪里可以找到一个有效的算法,最终会输出VCDIFF的描述?
顺便说一句,如果你碰巧发现使用SourceGear的DiffMerge实际算法的描述,那会更好。

Does anyone know where I can find a description of an efficient algorithm that'd end up outputting VCDIFF?
By the way, if you happen to find a description of the actual algorithm used by SourceGear's DiffMerge, that'd be even better.

注:最长公共子似乎并没有被使用VCDIFF算法,它看起来像他们正在做一些更聪明,因为他们使用的数据格式

NOTE: longest common subsequence doesn't seem to be the algorithm used by VCDIFF, it looks like they're doing something smarter, given the data format they use.

谢谢!

推荐答案

的O(ND)差分算法及其变体形式是一个奇妙的纸,你可能想从那里开始。它包括伪code和参与做差异图的遍历一个很好的可视化。

An O(ND) Difference Algorithm and its Variations is a fantastic paper and you may want to start there. It includes pseudo-code and a nice visualization of the graph traversals involved in doing the diff.

第4节的介绍了一些改进的算法,使得它非常有效。

Section 4 of the paper introduces some refinements to the algorithm that make it very effective.

成功地实施,这将留给你一个非常有用的工具,你的工具箱中(也可能是一些优秀的经验,以及)。

Successfully implementing this will leave you with a very useful tool in your toolbox (and probably some excellent experience as well).

生成需要有时会非常棘手的输出格式,但是如果你有算法内部的了解,那么你应该能够输出任何你所需要的。您也可以引入启发式影响产量和做出一定的权衡。

Generating the output format you need can sometimes be tricky, but if you have understanding of the algorithm internals, then you should be able to output anything you need. You can also introduce heuristics to affect the output and make certain tradeoffs.

下面是一个页面包含了一下文件,的全部源$ C ​​$ C 和使用的技术在上述算法差异算法的例子。

Here is a page that includes a bit of documentation, full source code, and examples of a diff algorithm using the techniques in the aforementioned algorithm.

借助源$ C ​​$ 似乎密切遵循的基本算法和易于阅读℃。

The source code appears to follow the basic algorithm closely and is easy to read.

还有一个有点preparing输入,这可能对你有用。有一个在输出的巨大差异,当你通过文字或标记(字)版本比较。

There's also a bit on preparing the input, which you may find useful. There's a huge difference in output when you are diffing by character or token (word).

祝你好运!

这篇关于差异算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆