Python正则表达式模块模糊匹配:替换计数不符合预期 [英] Python regex module fuzzy match: substitution count not as expected

查看:340
本文介绍了Python正则表达式模块模糊匹配:替换计数不符合预期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python模块 regex 允许模糊匹配.

The Python module regex allows fuzzy matching.

您可以指定允许的替换次数(s),插入(i),删除(d)和总错误(e).

You can specify the allowable number of substitutions (s), insertions (i), deletions (d), and total errors (e).

匹配结果的Fuzzy_counts属性返回一个元组(0,0,0),其中:

The fuzzy_counts property of a match result returns a tuple (0,0,0), where:

match.fuzzy_counts[0] = count for 's' 
match.fuzzy_counts[1] = count for 'i' 
match.fuzzy_counts[2] = count for 'd'

问题

删除和插入按预期计算,但不计入替换.

Problem

The deletions and insertions are counted as expected, but not the substitutions.

在下面的示例中,唯一的更改是查询中的单个字符已删除,但替换数为6 (如果删除了BESTMATCH选项,则为7).

In the example below, the only change is a single character deleted in the query, yet the substitutions count is 6 (7 if the BESTMATCH option is removed).

如何计算替代人数?

我将感谢任何人都可以向我解释这是如何工作的.

I would be grateful of someone can anyone explain how this works to me.

>>> import regex
>>> reference = "(TATGGGA[CT][GC]AAAG[CT]CT[AC]AA[GA]CCATGTG){s<7,i<3,d<3,e<8}"
>>> query = "TATGGACCAAAGTCTCAAGCCATGTG" 
>>> match = regex.search(reference, query, regex.BESTMATCH)
>>> print(match.fuzzy_counts)
(6,0,1)

推荐答案

这是由于regex模块的成本计算中的一个错误所致.它一直存在到正则表达式版本2015.10.05之前,但在下一版本2015.10.22中已得到修复,如下所示:

This was caused by what looks to be a bug in the regex module's cost calculations. It was still present up until regex version 2015.10.05, but was fixed in the next version, 2015.10.22, as shown below:

$ sudo pip3 install regex==2015.10.05
Processing /root/.cache/pip/wheels/24/cb/ae/9653e30c8f801544a645e17d26fa6803aeaf76ad0482663c27/regex-2015.10.5-cp38-cp38-linux_x86_64.whl
Installing collected packages: regex
Successfully installed regex-2015.10.5
$ python3 -c 'import regex; reference = "(TATGGGA[CT][GC]AAAG[CT]CT[AC]AA[GA]CCATGTG){s<7,i<3,d<3,e<8}"; query = "TATGGACCAAAGTCTCAAGCCATGTG"; match = regex.search(reference, query, regex.BESTMATCH);print(match.fuzzy_counts)'
(5, 0, 1)
$ sudo pip3 install regex==2015.10.22
Processing /root/.cache/pip/wheels/60/f6/9a/23e723633e62a79064cb301c54a3b50482b8c690f86c9983ee/regex-2015.10.22-cp38-cp38-linux_x86_64.whl
Installing collected packages: regex
  Found existing installation: regex 2015.10.5
    Uninstalling regex-2015.10.5:
      Successfully uninstalled regex-2015.10.5
Successfully installed regex-2015.10.22
$ python3 -c 'import regex; reference = "(TATGGGA[CT][GC]AAAG[CT]CT[AC]AA[GA]CCATGTG){s<7,i<3,d<3,e<8}"; query = "TATGGACCAAAGTCTCAAGCCATGTG"; match = regex.search(reference, query, regex.BESTMATCH);print(match.fuzzy_counts)'
(0, 0, 1)

鉴于这些日期,我推断修复该错误的提交为 https://bitbucket.org/mrabarnett/mrab-regex/commits/296c1daf86619039c6fe55868e7d861097d01aae ,并有描述

Given these dates, I infer that the commit that fixed the bug was https://bitbucket.org/mrabarnett/mrab-regex/commits/296c1daf86619039c6fe55868e7d861097d01aae, with description

汞问题161:意外的模糊匹配结果

Hg issue 161: Unexpected fuzzy match results

修复了该错误,并进行了一些相关的整理.

Fixed the bug and did some related tidying up.

引用的错误是 https://bitbucket.org/mrabarnett/mrab-regex/issues/161 .

这篇关于Python正则表达式模块模糊匹配:替换计数不符合预期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆