difflib.SequenceMatcher isjunk参数不考虑吗? [英] difflib.SequenceMatcher isjunk argument not considered?

查看:393
本文介绍了difflib.SequenceMatcher isjunk参数不考虑吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python difflib库中,SequenceMatcher类的行为是否异常,还是我误读了预期的行为?

In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is?

为什么在这种情况下isjunk参数似乎没有什么作用?

Why does the isjunk argument seem to not make any difference in this case?

difflib.SequenceMatcher(None, "AA", "A A").ratio() return 0.8

difflib.SequenceMatcher(lambda x: x in ' ', "AA", "A A").ratio() returns 0.8

我的理解是,如果省略空格,则比例应为1.

My understanding is that if space is omitted, the ratio should be 1.

推荐答案

之所以会发生这种情况,是因为ratio函数在计算比率时会使用总序列的长度,但不会使用isjunk过滤元素.因此,只要匹配块中的匹配数得到相同的值(带有和不带有isjunk),比率度量就将相同.

This is happening because the ratio function uses total sequences' length while calculating the ratio, but it doesn't filter elements using isjunk. So, as long as the number of matches in the matching blocks results in the same value (with and without isjunk), the ratio measure will be the same.

由于性能原因,我认为序列没有被isjunk过滤.

I assume that sequences are not filtered by isjunk because of performance reasons.

def ratio(self):   
    """Return a measure of the sequences' similarity (float in [0,1]).

    Where T is the total number of elements in both sequences, and
    M is the number of matches, this is 2.0*M / T.
    """

    matches = sum(triple[-1] for triple in self.get_matching_blocks())
    return _calculate_ratio(matches, len(self.a) + len(self.b))

self.aself.b是传递到SequenceMatcher对象(在您的示例中为"AA"和"AA")的字符串(序列). isjunk功能lambda x: x in ' '仅用于确定匹配的块.您的示例非常简单,因此两个调用的结果比率和匹配块相同.

self.a and self.b are the strings (sequences) passed to the SequenceMatcher object ("AA" and "A A" in your example). The isjunk function lambda x: x in ' ' is only used to determine the matching blocks. Your example is quite simple, so the resulting ratio and matching blocks are the same for both calls.

difflib.SequenceMatcher(None, "AA", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=2, b=3, size=0)]

difflib.SequenceMatcher(lambda x: x == ' ', "AA", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=2, b=3, size=0)]

相同的匹配块,比率为:M = 2, T = 6 => ratio = 2.0 * 2 / 6

现在考虑以下示例:

difflib.SequenceMatcher(None, "AA ", "A A").get_matching_blocks()
[Match(a=1, b=0, size=2), Match(a=3, b=3, size=0)]

difflib.SequenceMatcher(lambda x: x == ' ', "AA ", "A A").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=1), Match(a=3, b=3, size=0)]

现在匹配的块不同,但是比率相同,因为匹配数仍然相等:

isjunk为无时:M = 2, T = 6 => ratio = 2.0 * 2 / 6

isjunk lambda x: x == ' '时:M = 1 + 1, T = 6 => ratio = 2.0 * 2 / 6

最后,匹配项的数量不同:

difflib.SequenceMatcher(None, "AA ", "A A ").get_matching_blocks()
[Match(a=1, b=0, size=2), Match(a=3, b=4, size=0)]

difflib.SequenceMatcher(lambda x: x == ' ', "AA ", "A A ").get_matching_blocks()
[Match(a=0, b=0, size=1), Match(a=1, b=2, size=2), Match(a=3, b=4, size=0)]

匹配数不同

isjunk为无时:M = 2, T = 7 => ratio = 2.0 * 2 / 7

isjunk lambda x: x == ' '时:M = 1 + 2, T = 6 => ratio = 2.0 * 3 / 7

这篇关于difflib.SequenceMatcher isjunk参数不考虑吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆