Python RegEx 匹配后面/前面是相同字符但大写/小写的字符 [英] Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

查看：61 发布时间：2021/7/6 19:47:12 python regex

本文介绍了Python RegEx 匹配后面/前面是相同字符但大写/小写的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试构建一个正则表达式，它将发现:aA、Aa、bB、cC但不适合:aB、aa、AA、aC、Ca.

I am trying to build a regex which will find : aA, Aa, bB, cC but won't fit to : aB, aa, AA, aC, Ca.

-如果我们使用小写字母，我们要检查下一个/上一个字母是否为大写- 如果我们使用大写字母，我们要检查下一个/上一个字母是否为小写- 我们的正则表达式不应该发现大写/小写字母

-if we meed lowercase letter we want to check if next/previous letter is uppercase -if we meed uppercase letter we want to check if next/previous letter is lowercase -both uppercase/lowercase letters shouldnt get found by our regex

我希望任何字符后面/前面都有相同的字符但大写.

I want any char to be followed/preceded by the SAME CHAR but uppercase.

推荐答案

您可以使用 PyPi 正则表达式模块(注意它适用于 Java、PCRE(PHP、R、Delphi)、Perl、.NET，但不适用于 ECMAScript(JavaScript、C++ std::regex)、RE2(Go, Google Apps Script)) 使用

You may do it with PyPi regex module (note it will work with Java, PCRE (PHP, R, Delphi), Perl, .NET, but won't work with ECMAScript (JavaScript, C++ std::regex), RE2 (Go, Google Apps Script)) using

(\p{L})(?!\1)(?i:\1)

查看正则表达式演示和证明它在 Python 中有效:

import regex
rx = r'(\p{L})(?!\1)(?i:\1)'
print([x.group() for x in regex.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca')])
# => ['aA', 'Aa', 'bB', 'cC']

解决方案基于内联修饰符组 (?i:...)，其中所有字符都以不区分大小写的方式处理，而其他部分区分大小写(当然没有其他字符)(?i) 或 re.I).

The solution is based on the inline modifier group (?i:...) inside which all chars are treated in a case insensitive way while other parts are case sensitive (granted there are no other (?i) or re.I).

详情

(\p{L}) - 捕获到组 1 中的任何字母
(?!\1) - 如果下一个字符与第 1 组中捕获的字符完全相同，则匹配失败的负前瞻 - 请注意，正则表达式索引仍然在字符之后用 (\p{L})
(?i:\1) - 一个不区分大小写的修饰符组，它包含对组 1 值的反向引用，但由于它以不区分大小写的方式匹配它，它可以匹配两个 a 和 A - 但是前面的前瞻排除了具有备用大小写的变体(因为前面的 \1 以区分大小写的方式匹配).

(\p{L}) - any letter captured into Group 1
(?!\1) - a negative lookahead that fails the match if the next char is absolutely identical to the one captured in Group 1 - note that the regex index is still right after the char captured with (\p{L})
(?i:\1) - a case insensitive modifier group that contains a backreference to the value of Group 1 but since it matches it in a case insensitive way it could match both a and A - BUT the preceding lookahead excludes the variant with the alternate case (since the preceding \1 matched in a case sensitive way).

re 解决方案怎么样?

What about a re solution?

在 re 中，您不能将模式的一部分设为可选，因为模式的任何部分中的 (?i) 都会使其全部不区分大小写.此外，re 不支持修饰符组.

In re, you cannot make part of a pattern optional as (?i) in any part of a pattern makes all of it case insensitive. Besides, re does not support modifier groups.

你可以使用类似的东西

import re
rx = r'(?i)([^\W\d_])(\1)'
print([x.group() for x in re.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca') if x.group(1) != x.group(2)])

请参阅 Python 演示.

(?i) - 设置整个正则表达式不区分大小写
([^\W\d_]) - 一个字母被捕获到组 1 中
(\1) - 相同的字母被捕获到第 2 组(不区分大小写，所以 Aa, aA, aa 和 AA 将匹配).

(?i) - set the whole regex case insensitive
([^\W\d_]) - a letter is captured into Group 1
(\1) - the same letter is captured into Group 2 (case insensitive, so Aa, aA, aa and AA will match).

if x.group(1) != x.group(2) 条件过滤掉不需要的匹配.

The if x.group(1) != x.group(2) condition filters out the unwanted matches.

这篇关于Python RegEx 匹配后面/前面是相同字符但大写/小写的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python RegEx 匹配后面/前面是相同字符但大写/小写的字符 [英] Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python RegEx 匹配后面/前面是相同字符但大写/小写的字符 [英] Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭