汉明距离主场迎战莱文斯坦距离 [英] Hamming Distance vs. Levenshtein Distance

查看:381
本文介绍了汉明距离主场迎战莱文斯坦距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关我的工作,发现两个序列之间的距离,以确定他们的相似问题,排列顺序是非常重要的。但是,我有序列是各不相同的长度,所以我垫任何缺陷串与空点,使得两个序列的长度相同,为了满足的汉明距离的要求。有没有跟我这样做任何重大的问题,因为我所关心的是换位(未插入或缺失像莱文斯坦一样)有多少?

For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points such that both sequences are the same length in order to satisfy the Hamming distance requirement. Is there any major problem with me doing this, since all I care about are the number of transpositions (not insertions or deletions like Levenshtein does)?

我发现,汉明距离是多少,比莱文斯坦快得多作为距离度量长度较长的序列。当一个人应该使用,而不是便宜得多的汉明距离莱文斯坦距离(Levenshtein距离的衍生工具)?汉明距离可以被认为是上限两个序列之间可能的Levenshtein距离,所以如果我比较两个序列的顺序偏压相似性量度,而不是移动到匹配序列的绝对最小数量,没有明显的之所以我选择莱文斯坦在海明作为一个指标,是吗?

I've found that Hamming distance is much, much faster than Levenshtein as a distance metric for sequences of longer length. When should one use Levenshtein distance (or derivatives of Levenshtein distance) instead of the much cheaper Hamming distance? Hamming distance can be considered the upper bound for possible Levenshtein distances between two sequences, so if I am comparing the two sequences for a order-biased similarity metric rather than the absolute minimal number of moves to match the sequences, there isn't an apparent reason for me to choose Levenshtein over Hamming as a metric, is there?

推荐答案

这个问题真的取决于你正在匹配序列的类型,以及你想要的结果。

That question really depends on the types of sequences you are matching, and what result you want.

如果这不是一个问题,1234567890和0123456789被认为是完全不同的,确实是汉明距离是好的。

If it's not a problem that "1234567890" and "0123456789" are considered totally different, indeed Hamming distance is fine.

这篇关于汉明距离主场迎战莱文斯坦距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆