在单词边界匹配标点符号的正则表达式,包括下划线 [英] Regex that matches punctuation at the word boundary including underscore
问题描述
我正在为具有以下属性的变量短语寻找 Python 正则表达式:(为了举例,让我们假设这里的变量短语取值 and
.但请注意,我需要以一种扮演 and
角色的方式来执行此操作code> 可以作为变量传入,我将其称为 phrase
.)
I am looking for a Python regex for a variable phrase with the following properties:
(For the sake of example, let's assume the variable phrase here is taking the value and
. But note that I need to do this in a way that the thing playing the role of and
can be passed in as a variable which I'll call phrase
.)
应该匹配:this_and
、this.and
、(and)
、[and]
、and^
、;And
等
Should match: this_and
, this.and
, (and)
, [and]
, and^
, ;And
, etc.
不应该匹配:land
, andy
这是我到目前为止所尝试的(其中 phrase
扮演 和
的角色):
This is what I tried so far (where phrase
is playing the role of and
):
pattern = r"\b " + re.escape(phrase.lower()) + r"\b"
这似乎适用于我的所有要求,只是它不匹配带有下划线的单词,例如\_hello
、hello\_
、hello_world
.
This seems to work for all my requirements except that it does not match words with underscores e.g. \_hello
, hello\_
, hello_world
.
理想情况下,我想使用标准库 re 模块而不是任何外部包.
Ideally I would like to use the standard library re module rather than any external packages.
推荐答案
您可以使用
r'(?<![^\W_])and(?![^\W_])'
查看正则表达式演示.使用 re.I
标志编译以启用不区分大小写的匹配.
See the regex demo. Compile with the re.I
flag to enable case insensitive matching.
详情
(?<![^\W_])
- 前面的字符不应是字母或数字字符and
- 一些关键字(?![^\W_])
- 下一个字符不能是字母或数字
(?<![^\W_])
- the preceding char should not be a letter or digit charand
- some keyword(?![^\W_])
- the next char cannot be a letter or digit
import re
strs = ['this_and', 'this.and', '(and)', '[and]', 'and^', ';And', 'land', 'andy']
phrase = "and"
rx = re.compile(r'(?<![^\W_]){}(?![^\W_])'.format(re.escape(phrase)), re.I)
for s in strs:
print("{}: {}".format(s, bool(rx.search(s))))
输出:
this_and: True
this.and: True
(and): True
[and]: True
and^: True
;And: True
land: False
andy: False
这篇关于在单词边界匹配标点符号的正则表达式,包括下划线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!