在 Python 正则表达式中指定匹配新行的不同方法 [英] Different way to specify matching new line in Python regex
问题描述
我发现在 python 正则表达式中有不同的方法来匹配新行.例如,下面代码中使用的所有模式都可以匹配一个新行
I find out there are different ways to match a new line in python regex. For example, all patterns used in the code below can match a new line
str = 'abc\n123'
pattern = '\n' # print outputs new line
pattern2 = '\\n' # print outputs \n
pattern3 = '\\\n' # print outputs \ and new line
pattern4 = r'\n' # print outputs \n
s = re.search(pattern, str).group()
print ('a' + s + 'a')
我有两个问题:
pattern 是一个新行,pattern2 和pattern4 是\n.为什么python regex为不同的字符串生成相同的模式?
pattern is a new line, pattern2 and pattern4 is \n. Why python regex generates the same pattern for different string?
不知道为什么 pattern3 也会生成相同的模式.当传递给重新解析器时,pattern3 代表\ + 新行,为什么重新解析器将其翻译成刚好匹配的新行?
Not sure why pattern3 also generates the same pattern. When passed to re parser, pattern3 stands for \ + new line, why re parser translates that into just matching new line?
我使用的是 Python 3
I am using Python 3
推荐答案
组合 \n
表示both Python 本身和 在 re
表达式中也是如此(https://docs.python.org/2.0/ref/strings.html).
The combo \n
indicates a 'newline character' in both Python itself and in re
expressions as well (https://docs.python.org/2.0/ref/strings.html).
在常规 Python 字符串中,\n
被转换为换行符.换行符 code 然后作为文字字符输入 re
解析器.
In a regular Python string, \n
gets translated to a newline. The newline code is then fed into the re
parser as a literal character.
Python 字符串中的 双 反斜杠被转换为单个反斜杠.因此,字符串 "\\n"
在内部存储为 "\n"
,并且当发送到 re
解析器时,它反过来将这个组合 \n
识别为指示换行代码.
A double backslash in a Python string gets translated to a single one. Therefore, a string "\\n"
gets stored internally as "\n"
, and when sent to the re
parser, it in turn recognizes this combo \n
as indicating a newline code.
r
符号是防止必须输入双双反斜杠的快捷方式:
The r
notation is a shortcut to prevent having to enter double double backslashes:
在前缀为 'r'
(https://docs.python.org/2/library/re.html)
这篇关于在 Python 正则表达式中指定匹配新行的不同方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!