Python RegEx 匹配后面/前面是相同字符但大写/小写的字符 [英] Python RegEx that matches char followed/preceded by same char but uppercase/lowercase
问题描述
我正在尝试构建一个正则表达式,它将发现:aA、Aa、bB、cC但不适合:aB、aa、AA、aC、Ca.
I am trying to build a regex which will find : aA, Aa, bB, cC but won't fit to : aB, aa, AA, aC, Ca.
-如果我们使用小写字母,我们要检查下一个/上一个字母是否为大写- 如果我们使用大写字母,我们要检查下一个/上一个字母是否为小写- 我们的正则表达式不应该发现大写/小写字母
-if we meed lowercase letter we want to check if next/previous letter is uppercase -if we meed uppercase letter we want to check if next/previous letter is lowercase -both uppercase/lowercase letters shouldnt get found by our regex
我希望任何字符后面/前面都有相同的字符但大写.
I want any char to be followed/preceded by the SAME CHAR but uppercase.
推荐答案
您可以使用 PyPi 正则表达式模块(注意它适用于 Java、PCRE(PHP、R、Delphi)、Perl、.NET,但不适用于 ECMAScript(JavaScript、C++ std::regex
)、RE2(Go, Google Apps Script)) 使用
You may do it with PyPi regex module (note it will work with Java, PCRE (PHP, R, Delphi), Perl, .NET, but won't work with ECMAScript (JavaScript, C++ std::regex
), RE2 (Go, Google Apps Script)) using
(\p{L})(?!\1)(?i:\1)
import regex
rx = r'(\p{L})(?!\1)(?i:\1)'
print([x.group() for x in regex.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca')])
# => ['aA', 'Aa', 'bB', 'cC']
解决方案基于内联修饰符组 (?i:...)
,其中所有字符都以不区分大小写的方式处理,而其他部分区分大小写(当然没有其他字符)(?i)
或 re.I
).
The solution is based on the inline modifier group (?i:...)
inside which all chars are treated in a case insensitive way while other parts are case sensitive (granted there are no other (?i)
or re.I
).
详情
(\p{L})
- 捕获到组 1 中的任何字母(?!\1)
- 如果下一个字符与第 1 组中捕获的字符完全相同,则匹配失败的负前瞻 - 请注意,正则表达式索引仍然在字符之后用(\p{L})
捕获(?i:\1)
- 一个不区分大小写的修饰符组,它包含对组 1 值的反向引用,但由于它以不区分大小写的方式匹配它,它可以匹配两个a
和A
- 但是前面的前瞻排除了具有备用大小写的变体(因为前面的\1
以区分大小写的方式匹配).
(\p{L})
- any letter captured into Group 1(?!\1)
- a negative lookahead that fails the match if the next char is absolutely identical to the one captured in Group 1 - note that the regex index is still right after the char captured with(\p{L})
(?i:\1)
- a case insensitive modifier group that contains a backreference to the value of Group 1 but since it matches it in a case insensitive way it could match botha
andA
- BUT the preceding lookahead excludes the variant with the alternate case (since the preceding\1
matched in a case sensitive way).
re
解决方案怎么样?
What about a re
solution?
在 re
中,您不能将模式的一部分设为可选,因为模式的任何部分中的 (?i)
都会使其全部不区分大小写.此外,re
不支持修饰符组.
In re
, you cannot make part of a pattern optional as (?i)
in any part of a pattern makes all of it case insensitive. Besides, re
does not support modifier groups.
你可以使用类似的东西
import re
rx = r'(?i)([^\W\d_])(\1)'
print([x.group() for x in re.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca') if x.group(1) != x.group(2)])
请参阅 Python 演示.
(?i)
- 设置整个正则表达式不区分大小写([^\W\d_])
- 一个字母被捕获到组 1 中(\1)
- 相同的字母被捕获到第 2 组(不区分大小写,所以Aa
,aA
,aa
和AA
将匹配).
(?i)
- set the whole regex case insensitive([^\W\d_])
- a letter is captured into Group 1(\1)
- the same letter is captured into Group 2 (case insensitive, soAa
,aA
,aa
andAA
will match).
if x.group(1) != x.group(2)
条件过滤掉不需要的匹配.
The if x.group(1) != x.group(2)
condition filters out the unwanted matches.
这篇关于Python RegEx 匹配后面/前面是相同字符但大写/小写的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!