文本差分算法 [英] Text difference algorithm
问题描述
我需要一个算法,可以比较两个文本文件,并突出自己的差异和(甚至更好!)可以计算他们以有意义的方式的差异(如两个相似的文件应该有一个相似性得分高于两个不同的文件,字类似的在正常条件定义)。这听起来容易实现,但它不是。
I need an algorithm that can compare two text files and highlight their difference and ( even better!) can compute their difference in a meaningful way (like two similar files should have a similarity score higher than two dissimilar files, with the word "similar" defined in the normal terms). It sounds easy to implement, but it's not.
的实施可以在C#或蟒蛇。
The implementation can be in c# or python.
感谢。
推荐答案
在Python中,有 difflib ,也如其他人所说。
In Python, there is difflib, as also others have suggested.
difflib
提供 SequenceMatcher 类,它可以用来给你一个相似的比例。示例功能:
difflib
offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:
def text_compare(text1, text2, isjunk=None):
return difflib.SequenceMatcher(isjunk, text1, text2).ratio()
这篇关于文本差分算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!