在哪里可以找到差异算法? [英] Where can I find the diff algorithm?

查看:80
本文介绍了在哪里可以找到差异算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在哪里可以找到diff算法的解释和实现?

Where can I find an explanation and implementation of the diff algorithm?

首先我必须认识到我不确定这是否是正确的名称算法的例如,堆栈溢出如何标记同一问题的两次编辑之间的差异?

First of all I have to recognize that I'm not sure if this is the correct name of the algorithm. For example, how does Stack Overflow mark the differences between two edits of the same question?

PS:我知道C和PHP编程语言。

PS: I know the C and PHP programming languages.

推荐答案

实际上没有差异算法之类的东西。 diff算法有很多不同的方法,实际上,在某些情况下使用的特定diff算法被视为特定diff工具的业务优势。

There is really no such thing as "the diff algorithm". There are many different diff algorithms, and in fact the particular diff algorithms used are in some cases considered a business advantage of the particular diff tool.

通常,许多diff算法

In general, many diff algorithms are based on the Longest Common Subsequence (LCS) problem.

1970年代最初的Unix diff 程序是基于这样的程序编写的。由Doug McIllroy编写,并使用了称为Hunt-McIllroy算法。大约40年后,对该算法的扩展和派生仍然非常普遍。

The original Unix diff program from the 1970s was written by Doug McIllroy and uses what is known as the Hunt-McIllroy algorithm. Almost 40 years later, extensions and derivatives of that algorithm are still very common.

几年前,Bram Cohen(最成功的文件共享程序的创建者,也是最不成功的文件共享程序的创建者)成功的版本控制系统)创建了耐心差异算法,该算法的目的是提供比LCS。

A couple of years ago, Bram Cohen (creator of the most successful filesharing program and the least successful version control system) created the Patience Diff algorithm that is designed to give more human-readable results than LCS. It was originally implemented in the Bazaar VCS and also added to Git as an option.

但是,除非您对diff算法的研究感兴趣,否则最好的选择是仅使用现有的差异库,例如 Davide Libenzi的LibXDiff ,例如Git所使用的。如果已经有PHP扩展程序包装了它,我不会感到惊讶。一个很好的替代方法是 Google的Diff-Match-Patch库,该库在Bespin或例如WhiteRoom,它可用于多种语言。

However, unless you are interested in research on diff algorithms, your best bet would probably be to just use some existing diff library like Davide Libenzi's LibXDiff, which is for example what Git uses. I wouldn't be too surprised if there was already a PHP extension wrapping it. An nice alternative is Google's Diff-Match-Patch library, which is used in Bespin or WhiteRoom, for example and which is available for many languages. It uses the Meyers Diff Algorithm plus some pre- and post-processing for additional speedups.

如果您对合并比扩散更感兴趣,则可以使用另一种完全不同的方法,即所谓的完全不同。运营转型。 OT的想法是,您不必找出两个文档之间的差异,而是尝试对导致这些差异的 operations 进行反向工程。这样可以进行更好的合并,因为您可以随后重放这些操作。这些对于实时协作式编辑器(例如EtherPad,Google Wave或SubEthaEdit)最有用。

A completely different approach, if you are more interested in merging than diffing, is called Operational Transformations. The idea of OT is that instead of figuring out the differences between two documents, you try to "reverse engineer" the operations that led to those differences. This allows for much better merging, because you can then "replay" those operations. These are most useful for real-time collaborative editors such as EtherPad, Google Wave or SubEthaEdit.

这篇关于在哪里可以找到差异算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆