如何使用正则表达式找到最短的重叠匹配? [英] How do I find the shortest overlapping match using regular expressions?

查看：52 发布时间：2021/7/6 19:36:24 python regex

本文介绍了如何使用正则表达式找到最短的重叠匹配?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对正则表达式还是比较陌生.我试图找到与特定模式匹配的最短文本字符串，但如果最短模式是更大匹配的子字符串，我就会遇到问题.例如:

导入重新字符串 = "A|B|A|B|C|D|E|F|G"my_pattern = 'a.*?b.*?c'my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)匹配 = my_regex.findall(string)对于比赛中的比赛:打印匹配

打印:

A|B|A|B|C

但我希望它返回:

A|B|C

有没有办法做到这一点，而不必遍历每个匹配项以查看它是否包含匹配的子字符串?

解决方案

与此处的大多数其他答案相反，这可以使用正向前瞻断言，带有捕获组:

<预><代码>>>>my_pattern = '(?=(a.*?b.*?c))'>>>my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)>>>匹配 = my_regex.findall(string)>>>打印分钟(匹配，键= len)A|B|C

findall() 将返回所有可能的匹配项，因此您需要 min() 来获取最短的匹配项.

这是如何工作的:

我们没有匹配这个正则表达式中的任何文本，只是匹配字符串中的位置(正则表达式引擎在匹配尝试期间逐步执行).
在每个位置，正则表达式引擎会提前查看您的正则表达式是否在该位置匹配.
如果是这样，它将被捕获组捕获.
如果没有，就不会.
在任何一种情况下，正则表达式引擎都会提前一个字符并重复该过程直到字符串结束.
由于前瞻断言不消耗任何字符，因此将找到所有重叠的匹配项.

I'm still relatively new to regex. I'm trying to find the shortest string of text that matches a particular pattern, but am having trouble if the shortest pattern is a substring of a larger match. For example:

import re
string = "A|B|A|B|C|D|E|F|G"
my_pattern = 'a.*?b.*?c'

my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
matches = my_regex.findall(string)

for match in matches:
    print match

prints:

A|B|A|B|C

but I'd want it to return:

A|B|C

Is there a way to do this without having to loop over each match to see if it contains a substring that matches?

解决方案

Contrary to most other answers here, this can be done in a single regex using a positive lookahead assertion with a capturing group:

>>> my_pattern = '(?=(a.*?b.*?c))'
>>> my_regex = re.compile(my_pattern, re.DOTALL|re.IGNORECASE)
>>> matches = my_regex.findall(string)
>>> print min(matches, key=len)
A|B|C

findall() will return all possible matches, so you need min() to get the shortest one.

How this works:

We're not matching any text in this regex, just positions in the string (which the regex engine steps through during a match attempt).
At each position, the regex engine looks ahead to see whether your regex would match at this position.
If so, it will be captured by the capturing group.
If not, it won't.
In either case, the regex engine then steps ahead one character and repeats the process until the end of the string.
Since the lookahead assertion doesn't consume any characters, all overlapping matches will be found.

这篇关于如何使用正则表达式找到最短的重叠匹配?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用正则表达式找到最短的重叠匹配? [英] How do I find the shortest overlapping match using regular expressions?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用正则表达式找到最短的重叠匹配? [英] How do I find the shortest overlapping match using regular expressions?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭