在单个语句中处理大小写敏感和不区分大小写的正则表达式模式 [英] Regex pattern to handle both case-sensitive and case-insensitive in a single statement

查看:56
本文介绍了在单个语句中处理大小写敏感和不区分大小写的正则表达式模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小的正则表达式要处理.我有 2 个不同的术语.

  1. 美国",我想匹配忽略大小写
  2. US",我想在不忽略大小写的情况下匹配.

我想在单个正则表达式替换语句中执行以下两个正则表达式替换.

clntxt = re.sub('(?i)United States', 'USA', "united states")# 输出:美国clntxt = re.sub('US', 'USA', "US and us")# 输出:美国和我们

我需要类似的东西

clntxt = re.sub('(?i)United States|(?s)US', 'USA', "美国和美国和我们")# 输出:美国和美国和我们

我如何才能实现上述目标?

解决方案

在旧版 Python 中,(?i) 打开忽略大小写";整个表达式的标志.来自官方文档:

<块引用>

(?aiLmsux)

(集合 'a', 'i', 'L', 'm', 's', 'u', 'x' 中的一个或多个字母.)组匹配空字符串;字母设置相应的标志:re.A(仅 ASCII 匹配)、re.I(忽略大小写)、re.L(取决于语言环境)、re.M(多行)、re.S(点匹配所有), 和 re.X (详细),用于整个正则表达式.(标志在模块内容中描述.)如果您希望将标志作为正则表达式的一部分包含,而不是将标志参数传递给 re.compile() 函数,这将非常有用.标志应首先在表达式字符串中使用.

然而,从 Python 3.6 开始,您可以在表达式的一部分内切换标志:

<块引用>

(?imsx-imsx:...)

(集合 'i', 'm', 's', 'x' 中的零个或多个字母,可选后跟 '-' 后跟来自同一组的一个或多个字母.)设置或删除的字母对应的标志:re.I(忽略大小写)、re.M(多行)、re.S(点匹配所有)和 re.X(详细),用于表达式的一部分.(这些标志在模块内容中有描述.)

3.6 版中的新功能.

例如,(?i:foo)bar 匹配 foobarFOObar 但不匹配 fooBAR.所以回答你的问题:

<预><代码>>>>re.sub('(?i:United States)|US', 'USA', '美国和美国和我们')美国、美国和我们"

请注意,这只适用于 Python 3.6+.

I have a small regex to handle. I have 2 different terms.

  1. "United States", which I would like to match ignoring the case
  2. "US", which I would like to match without ignoring case.

I want to do the following two regex substitution in a single regex substitute statement.

clntxt = re.sub('(?i)United States', 'USA', "united states")
# Output: USA
clntxt = re.sub('US', 'USA', "US and us")
# output: USA and us

I need something like

clntxt = re.sub('(?i)United States|(?s)US', 'USA', "united states and US and us")
# output: USA and USA and us

How can I achieve the above?

解决方案

In legacy Python versions, (?i) turns on "ignore case" flag for the entire expression. From official doc:

(?aiLmsux)

(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. Flags should be used first in the expression string.

Since Python 3.6, however, you could toggle the flags within a part of the expression:

(?imsx-imsx:...)

(Zero or more letters from the set 'i', 'm', 's', 'x', optionally followed by '-' followed by one or more letters from the same set.) The letters set or removes the corresponding flags: re.I (ignore case), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the part of the expression. (The flags are described in Module Contents.)

New in version 3.6.

For example, (?i:foo)bar matches foobar and FOObar but not fooBAR. So to answer your question:

>>> re.sub('(?i:United States)|US', 'USA', 'united states and US and us')
'USA and USA and us'

Note this only works in Python 3.6+.

这篇关于在单个语句中处理大小写敏感和不区分大小写的正则表达式模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆