对正则表达式中的反斜杠感到困惑 [英] Confused about backslashes in regular expressions

查看:49
本文介绍了对正则表达式中的反斜杠感到困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对正则表达式中的反斜杠感到困惑.在正则表达式中, \ 具有特殊含义,例如\d 表示十进制数字.如果在反斜杠前面添加反斜杠,则此特殊含义将丢失.在 regex-howto 中可以阅读:

I am confused with the backslash in regular expressions. Within a regex a \ has a special meaning, e.g. \d means a decimal digit. If you add a backslash in front of the backslash this special meaning gets lost. In the regex-howto one can read:

也许最重要的元字符是反斜杠 \.与 Python 字符串文字一样,反斜杠后面可以跟各种字符以表示各种特殊序列.它还用于转义所有元字符,以便您仍然可以在模式中匹配它们;例如,如果您需要匹配一个 [\,您可以在它们前面加上一个反斜杠以去除它们的特殊含义:\[\\.

Perhaps the most important metacharacter is the backslash, \. As in Python string literals, the backslash can be followed by various characters to signal various special sequences. It’s also used to escape all the metacharacters so you can still match them in patterns; for example, if you need to match a [ or \, you can precede them with a backslash to remove their special meaning: \[ or \\.

所以 print(re.search('\d', '\d')) 给出 None 因为 \d 匹配任何小数数字,但 \d 中没有.

So print(re.search('\d', '\d')) gives None because \d matches any decimal digit but there is none in \d.

我现在希望 print(re.search('\\d', '\d')) 匹配 \d 但答案仍然是 .

I now would expect print(re.search('\\d', '\d')) to match \d but the answer is still None.

只有 print(re.search('\\\d', '\d')) 给出输出 <_sre.SRE_Match 对象;span=(0, 2), match='\\d'>.

有人解释一下吗?

推荐答案

混淆是由于反斜杠字符 \ 被用作两个不同级别的转义符.首先,Python 解释器本身会在 re 模块看到您的字符串之前执行对 \ 的替换.例如,\n 被转换为换行符,\t 被转换为制表符等.获得一个实际的 \ 字符,你也可以转义它,所以 \\ 给出一个 \ 字符.如果 \ 后面的字符不是可识别的转义字符,则 \ 将被视为任何其他字符并通过,但我不建议依赖于此.相反,始终通过将 \ 字符加倍来转义它们,即 \\.

The confusion is due to the fact that the backslash character \ is used as an escape at two different levels. First, the Python interpreter itself performs substitutions for \ before the re module ever sees your string. For instance, \n is converted to a newline character, \t is converted to a tab character, etc. To get an actual \ character, you can escape it as well, so \\ gives a single \ character. If the character following the \ isn't a recognized escape character, then the \ is treated like any other character and passed through, but I don't recommend depending on this. Instead, always escape your \ characters by doubling them, i.e. \\.

如果您想查看 Python 如何扩展您的字符串转义,只需打印出该字符串.例如:

If you want to see how Python is expanding your string escapes, just print out the string. For example:

s = 'a\\b\tc'
print(s)

如果 s 是聚合数据类型的一部分,例如一个列表或一个元组,如果您打印该聚合,Python 会将字符串括在单引号中并包含 \ 转义符(以规范形式),因此请注意您的字符串是如何打印.如果你只是在解释器中输入一个带引号的字符串,它也会显示它用 \ 转义的引号括起来.

If s is part of an aggregate data type, e.g. a list or a tuple, and if you print that aggregate, Python will enclose the string in single quotes and will include the \ escapes (in a canonical form), so be aware of how your string is being printed. If you just type a quoted string into the interpreter, it will also display it enclosed in quotes with \ escapes.

一旦您知道您的字符串是如何被编码的,您就可以考虑 re 模块将如何处理它.例如,如果您想在传递给 re 模块的字符串中转义 \,则需要将 \\ 传递给 re,这意味着您需要在引用的 Python 字符串中使用 \\\\.Python 字符串将以 \\ 结尾,re 模块会将其视为单个文字 \ 字符.

Once you know how your string is being encoded, you can then think about what the re module will do with it. For instance, if you want to escape \ in a string you pass to the re module, you will need to pass \\ to re, which means you will need to use \\\\ in your quoted Python string. The Python string will end up with \\ and the re module will treat this as a single literal \ character.

在 Python 字符串中包含 \ 字符的另一种方法是使用原始字符串,例如r'a\b' 等价于 "a\\b".

An alternative way to include \ characters in Python strings is to use raw strings, e.g. r'a\b' is equivalent to "a\\b".

这篇关于对正则表达式中的反斜杠感到困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆