带有前瞻的正则表达式在 Python 中不匹配 [英] Regex with lookahead does not match in Python

查看：55 发布时间：2021/7/6 20:42:16 python-3.x regex

本文介绍了带有前瞻的正则表达式在 Python 中不匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个正则表达式模式，旨在从句子中捕获一个日期和一个数字.但事实并非如此.

I have composed a regex pattern aiming to capture one date and one number from a sentence. But it does not.

我的代码是:

txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards

               (?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
               
               \.+ # Allow for an arbitrary sequence of characters 
               
               (?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*)) # defines two lookaheads, either of which suffices
               
               (?P<KEK_number>\d+) # captures a sequence of numbers''', re.I|re.VERBOSE)

p.findall(txt)

我希望返回一个包含两个元素的列表:'02/12/2013' 和 '110035'，但它返回一个空列表.

I would expect to return a list with two elements: '02/12/2013' and '110035', but instead, it returns an empty list.

推荐答案

问题:

\.+ 匹配一个或多个点，需要使用.+(不转义)
(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+) 将始终阻止任何匹配，因为正向前瞻需要一些不是 1 位或更多位数字的文本.您需要将前瞻转换为消费模式.

\.+ matches one or more dots, you need to use .+ (no escaping)
(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+) will always prevent any match since the positive lookahead requires some text that is not 1 or more digits. You need to convert the lookahead to a consuming pattern.

我建议将您的模式固定为

I suggest fixing your pattern as

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)

查看正则表达式演示

详情

Την\s? - Την 字符串和一个可选的空格
(?P\d{2}/\d{2}/\d{4}) - 组KEK_date":一个日期模式，2 位数字，/、2 位数字、/ 和 4 位数字
.+ - 除换行符以外的 1 个或更多字符
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) - 任何一个
- κωδικ.\s?αριθμ.\s?καταχ.ριση. - κωδικ，任意字符，可选空格，αριθμ，任何一个字符，一个可选的空格，καταχ，任何 1 个字符，ριση 和任何 1 个字符(但换行符)
- | - 或
- κ\.?α\.κ\.:? - κ，一个可选的.，α，一个可选的.，κ一个.，然后一个可选的:
- Την\s? - Την string and an optional whitespace
- (?P<KEK_date>\d{2}/\d{2}/\d{4}) - Group "KEK_date": a date pattern, 2 digits, /, 2 digits, / and 4 digits
- .+ - 1 or more chars other than line break chars as many as possible
- (?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) - either of
  - κωδικ.\s?αριθμ.\s?καταχ.ριση. - κωδικ, any char, an optional whitespace, αριθμ, any one char, an optional whitespace, καταχ, any 1 char, ριση and any 1 char (but line break char)
  - | - or
  - κ\.?α\.κ\.:? - κ, an optional ., α, an optional ., κ a . and then an optional :
  查看 Python 演示:
```
import re
txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
print(p.findall(txt)) # => [('02/12/2013', '110035')]
```
  这篇关于带有前瞻的正则表达式在 Python 中不匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有前瞻的正则表达式在 Python 中不匹配 [英] Regex with lookahead does not match in Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

带有前瞻的正则表达式在 Python 中不匹配 [英] Regex with lookahead does not match in Python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭