Python RegEx 匹配后面/前面是相同字符但大写/小写的字符 [英] Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

查看:61
本文介绍了Python RegEx 匹配后面/前面是相同字符但大写/小写的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个正则表达式,它将发现:aA、Aa、bB、cC但不适合:aB、aa、AA、aC、Ca.

I am trying to build a regex which will find : aA, Aa, bB, cC but won't fit to : aB, aa, AA, aC, Ca.

-如果我们使用小写字母,我们要检查下一个/上一个字母是否为大写- 如果我们使用大写字母,我们要检查下一个/上一个字母是否为小写- 我们的正则表达式不应该发现大写/小写字母

-if we meed lowercase letter we want to check if next/previous letter is uppercase -if we meed uppercase letter we want to check if next/previous letter is lowercase -both uppercase/lowercase letters shouldnt get found by our regex

我希望任何字符后面/前面都有相同的字符但大写.

I want any char to be followed/preceded by the SAME CHAR but uppercase.

推荐答案

您可以使用 PyPi 正则表达式模块(注意它适用于 Java、PCRE(PHP、R、Delphi)、Perl、.NET,但不适用于 ECMAScript(JavaScript、C++ std::regex)、RE2(Go, Google Apps Script)) 使用

You may do it with PyPi regex module (note it will work with Java, PCRE (PHP, R, Delphi), Perl, .NET, but won't work with ECMAScript (JavaScript, C++ std::regex), RE2 (Go, Google Apps Script)) using

(\p{L})(?!\1)(?i:\1)

查看正则表达式演示证明它在 Python 中有效:

import regex
rx = r'(\p{L})(?!\1)(?i:\1)'
print([x.group() for x in regex.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca')])
# => ['aA', 'Aa', 'bB', 'cC']

解决方案基于内联修饰符组 (?i:...),其中所有字符都以不区分大小写的方式处理,而其他部分区分大小写(当然没有其他字符)(?i)re.I).

The solution is based on the inline modifier group (?i:...) inside which all chars are treated in a case insensitive way while other parts are case sensitive (granted there are no other (?i) or re.I).

详情

  • (\p{L}) - 捕获到组 1 中的任何字母
  • (?!\1) - 如果下一个字符与第 1 组中捕获的字符完全相同,则匹配失败的负前瞻 - 请注意,正则表达式索引仍然在字符之后用 (\p{L})
  • 捕获
  • (?i:\1) - 一个不区分大小写的修饰符组,它包含对组 1 值的反向引用,但由于它以不区分大小写的方式匹配它,它可以匹配两个 aA - 但是前面的前瞻排除了具有备用大小写的变体(因为前面的 \1 以区分大小写的方式匹配).
  • (\p{L}) - any letter captured into Group 1
  • (?!\1) - a negative lookahead that fails the match if the next char is absolutely identical to the one captured in Group 1 - note that the regex index is still right after the char captured with (\p{L})
  • (?i:\1) - a case insensitive modifier group that contains a backreference to the value of Group 1 but since it matches it in a case insensitive way it could match both a and A - BUT the preceding lookahead excludes the variant with the alternate case (since the preceding \1 matched in a case sensitive way).

re 解决方案怎么样?

What about a re solution?

re 中,您不能将模式的一部分设为可选,因为模式的任何部分中的 (?i) 都会使其全部不区分大小写.此外,re 不支持修饰符组.

In re, you cannot make part of a pattern optional as (?i) in any part of a pattern makes all of it case insensitive. Besides, re does not support modifier groups.

你可以使用类似的东西

import re
rx = r'(?i)([^\W\d_])(\1)'
print([x.group() for x in re.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca') if x.group(1) != x.group(2)])

请参阅 Python 演示.

  • (?i) - 设置整个正则表达式不区分大小写
  • ([^\W\d_]) - 一个字母被捕获到组 1 中
  • (\1) - 相同的字母被捕获到第 2 组(不区分大小写,所以 Aa, aA, aaAA 将匹配).
  • (?i) - set the whole regex case insensitive
  • ([^\W\d_]) - a letter is captured into Group 1
  • (\1) - the same letter is captured into Group 2 (case insensitive, so Aa, aA, aa and AA will match).

if x.group(1) != x.group(2) 条件过滤掉不需要的匹配.

The if x.group(1) != x.group(2) condition filters out the unwanted matches.

这篇关于Python RegEx 匹配后面/前面是相同字符但大写/小写的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆