差异算法? [英] Diff Algorithm?

查看:161
本文介绍了差异算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在疯狂地解释一种有效且有效的差异算法。

I've been looking like crazy for an explanation of a diff algorithm that works and is efficient.

我最接近的是此RFC 3284链接(来自Eric Sink的几篇博客文章),该链接以完全易于理解的术语描述了数据格式存储差异结果。但是,对于程序在进行比较时如何达到这些结果,它却丝毫没有提及。

The closest I got is this link to RFC 3284 (from several Eric Sink blog posts), which describes in perfectly understandable terms the data format in which the diff results are stored. However, it has no mention whatsoever as to how a program would reach these results while doing a diff.

我正出于个人的好奇心进行研究,因为我'确保在实现差异算法时必须权衡取舍,当您查看差异时,这很清楚,有时您会想知道为什么差异程序选择了此而不是将其作为更改? ...

I'm trying to research this out of personal curiosity, because I'm sure there must be tradeoffs when implementing a diff algorithm, which are pretty clear sometimes when you look at diffs and wonder "why did the diff program chose this as a change instead of that?"...

在哪里可以找到最终输出VCDIFF的有效算法的描述?

如果您碰巧找到了所用实际算法的描述,

Where can I find a description of an efficient algorithm that'd end up outputting VCDIFF?
By the way, if you happen to find a description of the actual algorithm used by SourceGear's DiffMerge, that'd be even better.

注意:最长的公共子序列似乎不是VCDIFF所使用的算法,看起来他们在做

NOTE: longest common subsequence doesn't seem to be the algorithm used by VCDIFF, it looks like they're doing something smarter, given the data format they use.

推荐答案

O(ND)差异算法及其变体是一本奇妙的论文,您可能要从那里开始。

An O(ND) Difference Algorithm and its Variations is a fantastic paper and you may want to start there. It includes pseudo-code and a nice visualization of the graph traversals involved in doing the diff.

第4章,其中包括伪代码和很好的可视化关系图。

Section 4 of the paper introduces some refinements to the algorithm that make it very effective.

成功实现此功能将为您提供一个非常有用的工具,在您的工具箱中(可能还有一些出色的经验)。

Successfully implementing this will leave you with a very useful tool in your toolbox (and probably some excellent experience as well).

生成所需的输出格式有时会比较棘手,但是如果您了解算法的内部原理,那么您应该可以输出所需的任何内容。您还可以引入启发式方法来影响输出并进行某些折衷。

Generating the output format you need can sometimes be tricky, but if you have understanding of the algorithm internals, then you should be able to output anything you need. You can also introduce heuristics to affect the output and make certain tradeoffs.

这是一个页面,其中包含一些文档,完整的源代码,以及使用上述算法中的技术的diff算法示例。

Here is a page that includes a bit of documentation, full source code, and examples of a diff algorithm using the techniques in the aforementioned algorithm.

源代码似乎紧跟基本算法,而且易于阅读。

The source code appears to follow the basic algorithm closely and is easy to read.

还有一些准备输入的内容,您可能会觉得有用。当您按字符或标记(单词)进行区分时,输出会有巨大差异。

There's also a bit on preparing the input, which you may find useful. There's a huge difference in output when you are diffing by character or token (word).

祝您好运!

这篇关于差异算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆