在python中比较字符串时忽略空格 [英] ignore spaces when comparing strings in python

查看:1012
本文介绍了在python中比较字符串时忽略空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用difflib python软件包.无论是否设置isjunk参数,计算出的比率都是相同的.当isjunklambda x: x == " "时,是否忽略空格的差异?

I am using difflib python package. No matter whether I set isjunk argument, the calculated ratios are the same. Isn't the difference of spaces ignored when isjunk is lambda x: x == " "?

In [193]: difflib.SequenceMatcher(isjunk=lambda x: x == " ", a="a b c", b="a bc").ratio()
Out[193]: 0.8888888888888888

In [194]: difflib.SequenceMatcher(a="a b c", b="a bc").ratio()
Out[194]: 0.8888888888888888

推荐答案

isjunk的工作方式与您想象的略有不同.通常,isjunk仅标识一个或多个字符,这些字符不影响匹配的长度,但仍包含在总字符数中.例如,考虑以下内容:

isjunk works a little differently than you might think. In general, isjunk merely identifies one or more characters that do not affect the length of a match but that are still included in the total character count. For example, consider the following:

>>> SequenceMatcher(lambda x: x in "abcd", " abcd", "abcd abcd").ratio()
0.7142857142857143

第二个字符串("abcd")的前四个字符都是可忽略的,因此可以将第二个字符串与以空格开头的第一个字符串进行比较.从第一个字符串和第二个字符串中的空格开始,然后,上面的SequenceMatcher查找十个匹配的字符(每个字符串五个)和4个不匹配的字符(第二个字符串中可忽略的前四个字符).这使您的比率为10/14(0.7142857142857143).

The first four characters of the second string ("abcd") are all ignorable, so the second string can be compared to the first string beginning with the space. Starting with the space in both the first string and the second string, then, the above SequenceMatcher finds ten matching characters (five in each string) and 4 non-matching characters (the ignorable first four characters in the second string). This gives you a ratio of 10/14 (0.7142857142857143).

然后,在您的情况下,第一个字符串"a b c"与第二个字符串在索引0、1和2(值"a b")匹配.第一个字符串的索引3(" ")没有匹配项,但在匹配长度方面被忽略.由于忽略了空格,因此索引4("c")与第二个字符串的索引3匹配.因此,您的9个字符中有8个匹配,因此比率为0.88888888888888.

In your case, then, the first string "a b c" matches the second string at indices 0, 1, and 2 (with values "a b"). Index 3 of the first string (" ") does not have a match but is ignored with regard to the length of the match. Since the space is ignored, index 4 ("c") matches index 3 of the second string. Thus 8 of your 9 characters match, giving you a ratio of 0.88888888888888.

您可能想尝试以下方法:

You might want to try this instead:

>>> c = a.replace(' ', '')
>>> d = b.replace(' ', '')
>>> difflib.SequenceMatcher(a=c, b=d).ratio()
1.0

这篇关于在python中比较字符串时忽略空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆