Python中的正则表达式引用 [英] Regular expression quoting in Python

查看:62
本文介绍了Python中的正则表达式引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我应该如何声明正则表达式?

How should I declare a regular expression?

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)

我有点想知道为什么这有效.我以为我需要使用 r'' 来传递正则表达式.

I'm kind of wondering why this worked. I thought that I need to use the r'' to pass a regular expression.

mergedData = re.sub("\$(.*?)\$", readFile, allData)

"\$" 在这种情况下会产生什么结果?为什么?我会想到 "$".

What does "\$" result in in this case? Why? I would have thought "$".

推荐答案

我认为我需要使用 r'' 来传递正则表达式.

I thought that I need to user the r'' to pass a regular expression.

r 在字符串字面量之前表示原始字符串,这意味着不再处理诸如 \n\r 之类的常见转义序列作为换行符或回车符,但只需 \ 后跟 nr.要指定 \,您只需要在原始字符串文字中使用 \,而在普通字符串文字中需要将 \\ 加倍.这就是为什么通常使用原始字符串来指定正则表达式1的原因.它减少了阅读代码时的混乱.如果使用普通字符串文字,则必须进行两次转义:一次用于普通字符串文字转义,第二次用于在正则表达式中转义.

r before a string literal indicates raw string, which means the usual escape sequences such as \n or \r are no longer treated as new line character or carriage return, but simply \ followed by n or r. To specify a \, you only need \ in raw string literal, while you need to double it up \\ in normal string literal. This is why it is usually the case that raw string is used in specifying regular expression1. It reduces the confusion when reading the code. You would have to do escaping twice if you use normal string literal: once for the normal string literal escape and the second time for the escaping in regex.

"\$" 在这种情况下会产生什么结果?为什么?我本以为 "$"

What does "\$" result in this case? Why? I would have thought "$"

在 Python 普通字符串文字中,如果 \ 后面没有转义序列,则保留 \.因此 "\$" 导致 \ 后跟 $.

In Python normal string literal, if \ is not followed by an escape sequence, the \ is preserved. Therefore "\$" results in \ followed by $.

此行为与 C/C++ 或 JavaScript 处理类似情况的方式略有不同:\ 被视为下一个字符的转义符,并且仅保留下一个字符.所以 "\$" 在这些语言中将被解释为 $.

This behavior is slightly different from the way C/C++ or JavaScript handle similar situation: the \ is considered escape for the next character, and only the next character remains. So "\$" in those languages will be interpreted as $.

脚注

1:Python 中原始字符串的设计有一个小缺陷:为什么 Python 的原始字符串文字不能以单个反斜杠结尾?

1: There is a small defect with the design of raw string in Python, though: Why can't Python's raw string literals end with a single backslash?

这篇关于Python中的正则表达式引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆