Python正则表达式-R前缀 [英] Python regex - r prefix

查看:278
本文介绍了Python正则表达式-R前缀的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在不使用r前缀的情况下,谁能解释下面的示例1为何起作用? 我认为,每当使用转义序列时,都必须使用r前缀. 示例2和示例3对此进行了演示.

Can anyone explain why example 1 below works, when the r prefix is not used? I thought the r prefix must be used whenever escape sequences are used. Example 2 and example 3 demonstrate this.

# example 1
import re
print (re.sub('\s+', ' ', 'hello     there      there'))
# prints 'hello there there' - not expected as r prefix is not used

# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello     there      there'))
# prints 'hello     there' - as expected as r prefix is used

# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello     there      there'))
# prints 'hello     there      there' - as expected as r prefix is not used

推荐答案

因为\仅在它们是有效的转义序列时才开始转义序列.

Because \ begin escape sequences only when they are valid escape sequences.

>>> '\n'
'\n'
>>> r'\n'
'\\n'
>>> print '\n'


>>> print r'\n'
\n
>>> '\s'
'\\s'
>>> r'\s'
'\\s'
>>> print '\s'
\s
>>> print r'\s'
\s

除非存在'r'或'R'前缀,否则转义序列是根据与标准C相似的规则进行解释的.可识别的转义序列为:

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape Sequence   Meaning Notes
\newline  Ignored  
\\    Backslash (\)    
\'    Single quote (')     
\"    Double quote (")     
\a    ASCII Bell (BEL)     
\b    ASCII Backspace (BS)     
\f    ASCII Formfeed (FF)  
\n    ASCII Linefeed (LF)  
\N{name}  Character named name in the Unicode database (Unicode only)  
\r    ASCII Carriage Return (CR)   
\t    ASCII Horizontal Tab (TAB)   
\uxxxx    Character with 16-bit hex value xxxx (Unicode only) 
\Uxxxxxxxx    Character with 32-bit hex value xxxxxxxx (Unicode only) 
\v    ASCII Vertical Tab (VT)  
\ooo  Character with octal value ooo
\xhh  Character with hex value hh

绝对不要使用原始字符串作为路径文字,因为原始字符串具有一些相当特殊的内部工作原理,这些内部工作原理已知会咬人:

Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:

当存在"r"或"R"前缀时,字符串中包含反斜杠后面的字符而不会更改,并且所有反斜杠都保留在字符串中.例如,字符串文字r"\n"由两个字符组成:反斜杠和小写的"n".字符串引号可以用反斜杠转义,但反斜杠仍保留在字符串中;反斜杠保留在字符串中.例如,r"\""是由两个字符组成的有效字符串文字:反斜杠和双引号; r"\"不是有效的字符串文字(即使原始字符串也不能以奇数个反斜杠结尾).特别是,原始字符串不能以单个反斜杠结尾(因为反斜杠会转义以下引号字符).还要注意,单个反斜杠后跟换行符将被解释为这两个字符是字符串的一部分,而不是换行符.

When an "r" or "R" prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase "n". String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation.

为了更好地说明最后一点:

To better illustrate this last point:

>>> r'\'
SyntaxError: EOL while scanning string literal
>>> r'\''
"\\'"
>>> '\'
SyntaxError: EOL while scanning string literal
>>> '\''
"'"
>>> 
>>> r'\\'
'\\\\'
>>> '\\'
'\\'
>>> print r'\\'
\\
>>> print r'\'
SyntaxError: EOL while scanning string literal
>>> print '\\'
\

这篇关于Python正则表达式-R前缀的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆