转义Python中的所有元字符 [英] Escape all metacharacters in Python
问题描述
我需要搜索可能包含许多元字符的模式.目前,我使用长正则表达式.
I need to search for patterns which may have many metacharacters. Currently I use a long regex.
prodObjMatcher=re.compile(r"""^(?P<nodeName>[\w\/\:\[\]\<\>\@\$]+)""", re.S|re.M|re.I|re.X)
(我的实际模式很长,所以我只粘贴了一些需要帮助的相关部分)
(my actual pattern is very long so I just pasted some relevant portion on which I need help)
当我需要在一次重新编译中编写此类模式的组合时,这尤其痛苦.
This is especially painful when I need to write combinations of such patterns in a single re compilation.
是否有一种缩短样式长度的Python方法?
Is there a pythonic way for shortening the pattern length?
推荐答案
看起来,您的模式可以简化为
Look, your pattern can be reduced to
r"""^(?P<nodeName>[]\w/:[<>@$]+).*?"""
请注意,除了速记类^
,-
,]
和\
之外,您不必在字符类中转义任何非单词字符.有一些方法可以使字符类中的那些字符(\
除外)保持不转义:
Note that you do not have to ever escape any non-word character in the character classes, except for shorthand classes, ^
, -
, ]
, and \
. There are ways to keep even those (except for \
) unescaped in the character class:
-
字符类开头的
-
]
-
-
在字符类的开头/结尾 -
^
-仅当将其作为文字符号放置在字符类的开头时才应转义.
]
at the start of the character class-
at the start/end of the character class^
- should only be escaped if you place it at the start of the character class as a literal symbol.
在字符类之外,必须转义\
,[
,(
,)
,+
,$
,^
,*
,?
,.
Outside a character class, you must escape \
, [
, (
, )
, +
, $
, ^
, *
, ?
, .
.
请注意,/
在Python regex模式中不是特殊的regex元字符,并且不必转义.
Note that /
is not a special regex metacharacter in Python regex patterns, and does not have to be escaped.
在定义正则表达式模式时使用原始字符串文字,以避免出现问题(例如混淆单词边界r'\b'
和退格键'\b'
).
Use raw string literals when defining your regex patterns to avoid issues (like confusing word boundary r'\b'
and a backspace '\b'
).
这篇关于转义Python中的所有元字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!