正则表达式意外匹配 [英] Unexpected match of regex

查看:60
本文介绍了正则表达式意外匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望正则表达式模式 ab{,2}c 只与 a 匹配,后跟 0、1 或 2 个 bs,后跟 c.

I expect the regex pattern ab{,2}c to match only with a followed by 0, 1 or 2 bs, followed by c.

它在很多语言中都是这样工作的,例如 Python.但是,在 R 中:

It works that way in lots of languages, for instance Python. However, in R:

grepl("ab{,2}c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
# [1]  TRUE  TRUE  TRUE  TRUE FALSE

我对第四个 TRUE 感到惊讶.在 ?regex 中,我可以阅读:

I'm surprised by the 4th TRUE. In ?regex, I can read:

{n,m} 前一项至少匹配 n 次,但不能更多比 m 次.

{n,m} The preceding item is matched at least n times, but not more than m times.

所以我同意 {,2} 应该写成 {0,2} 是一个有效的模式(不像在 Python 中,文档明确指出省略n 指定下限为零).

So I agree that {,2} should be written {0,2} to be a valid pattern (unlike in Python, where the docs state explicitly that omitting n specifies a lower bound of zero).

但是使用 {,2} 应该会抛出错误而不是返回误导性的匹配!我是否遗漏了什么,还是应该将其报告为错误?

But then using {,2} should throw an error instead of returning misleading matches! Am I missing something or should this be reported as a bug?

推荐答案

{,2} 的行为不是预期的,这是一个错误.如果你看看TRE源代码,tre_parse_bound 方法,您将看到 min 变量值在引擎尝试初始化最小边界之前设置为 -1.如果量词中缺少最小值,重复"的数量似乎是最大值的数量 + 1(好像重复数量等于 max - min = max -(-1) = max+1).

The behavior with {,2} is not expected, it is a bug. If you have a look at the TRE source code, tre_parse_bound method, you will see that the min variable value is set to -1 before the engine tries to initialize the minimum bound. It seems that the number of "repeats" in case the minimum value is missing in the quantifier is the number of maximum value + 1 (as if the repeat number equals max - min = max - (-1) = max+1).

因此,a{,} 匹配一次出现的 a.与 a{, }a{ , } 相同.参见R演示,只有abcab{,}c:

So, a{,} matches one occurrence of a. Same as a{, } or a{ , }. See R demo, only abc is matched with ab{,}c:

grepl("ab{,}c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{, }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
grepl("ab{ ,   }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
## => [1] FALSE  TRUE FALSE FALSE FALSE

这篇关于正则表达式意外匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆