修改莱文斯坦 - 距离无视秩序 [英] Modify Levenshtein-Distance to ignore order

查看:213
本文介绍了修改莱文斯坦 - 距离无视秩序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在找计算包含多达6个数值序列之间的莱文斯坦距离。 这些值的顺序不应该影响的距离。

I'm looking to compute the the Levenshtein-distance between sequences containing up to 6 values. The order of these values should not affect the distance.

我将如何落实到迭代或递归算法呢?

How would I implement this into the iterative or recursive algorithm?

例如:

# Currently 
>>> LDistance('dog', 'god')
2

# Sorted
>>> LDistance('dgo', 'dgo')
0

# Proposed
>>> newLDistance('dog', 'god')
0

狗和上帝有相同的字母,排序字符串前手将返回所需的结果。然而,这不工作的所有时间:

'dog' and 'god' have the exact same letters, sorting the strings before hand will return the desired result. However this doesn't work all the time:

# Currently 
>>> LDistance('doge', 'gold')
3

# Sorted
>>> LDistance('dego', 'dglo')
2

# Proposed
>>> newLDistance('doge', 'gold')
1

总督和黄金有3/4的匹配字母,因此应该返回1的距离。 这是我目前的递归code:

'doge' and 'gold' have 3/4 matching letters and so should return a distance of 1. Here is my current recursive code:

def mLD(s, t):
    memo = {}
    def ld(s, t):
        if not s: return len(t)
        if not t: return len(s)
        if s[0] == t[0]: return ld(s[1:], t[1:])
        if (s, t) not in memo:
            l1 = ld(s, t[1:])
            l2 = ld(s[1:], t)
            l3 = ld(s[1:], t[1:])
            memo[(s,t)] = 1 + min(l1, l2, l3)
        return memo[(s,t)]
    return ld(s, t)

编辑:的后续问题:<一href="http://stackoverflow.com/questions/32459196/adding-exceptions-to-levenshtein-distance-like-algorithm">Adding例外莱文斯坦 - 距离般的算法

推荐答案

您不需要莱文斯坦机械这一点。

You don't need the Levenshtein machinery for this.

import collections
def distance(s1, s2):
    cnt = collections.Counter()
    for c in s1:
        cnt[c] += 1
    for c in s2:
        cnt[c] -= 1
    return sum(abs(diff) for diff in cnt.values()) // 2 + \
        (abs(sum(cnt.values())) + 1) // 2   # can be omitted if len(s1) == len(s2)

这篇关于修改莱文斯坦 - 距离无视秩序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆