将此PHP正则表达式用于多行字符串的意图转换为Python/PERL [英] Translate the intent of this PHP regex for multiline strings, into Python/PERL
问题描述
以下是一个PHP正则表达式,旨在匹配PHP或JavaScript源代码中的(多行)字符串(来自
Below is a PHP regex intended to match (multiline) strings inside PHP or JavaScript source code (from this post), but I suspect it's got issues. What is the literal Python (or else PERL) equivalent of this?
~'(\\.|[^'])*'|"(\\.|[^"])*"~s
- s修饰符表示点匹配所有字符,包括换行符;在Python中
re.compile(..., re.DOTALL)
- 我完全不了解领先的
\\.
的意图吗?那会减少到.
吗?双反斜杠是否需要在PHP中进行两次转义? -
允许每个位置匹配
\\.
或[^']
(任何不带引号的字符)对我来说似乎完全过头了,也许可以解释为什么此人的正则表达式会爆炸.[^']
组是否已不匹配具有s修饰符的.
的所有内容,确定它应该与换行符匹配?- the s modifier means dot matches all characters, including newline; in Python that's
re.compile(..., re.DOTALL)
- I totally don't get the intent of the leading
\\.
? Does that reduce to.
? Are double-backslashes need to escape it twice in PHP? allowing in every position a match of either
\\.
or[^']
(any non-quote character) seems total overkill to me, maybe explains why this person's regex blows up. Does[^']
group not already match everything that.
with s modifier does, surely it should match newlines?用于在Python中用单引号和双引号构造两个版本的regex,可以使用此两步方法
for constructing two versions of the regex with single, and double, quotes in Python, can use this two-step approach
注意:此regex的较简单版本也可以在此 PHP列表中找到regex示例,在编程:字符串.
NB a simpler version of this regex can also be found in this list of PHP regex examples, under Programming: String.
推荐答案
除不处理转义引号(即
\"
和\'
)外,正则表达式通常还可以.修复起来很容易:The regex is mostly okay, except it doesn't handle escaped quotes (i.e.,
\"
and\'
). That's easy enough to fix:'(?:\\.|[^'\\]+)*'|"(?:\\.|[^"\\]+)*"
这是一个通用"正则表达式;在Python中,您通常会以原始字符串的形式编写它:
That's a "generic" regex; in Python you would usually write it in the form of a raw string:
r"""'(?:\\.|[^'\\]+)*'|"(?:\\.|[^"\\]+)*""""
在PHP中,您必须转义反斜杠以使它们通过PHP的字符串处理:
In PHP you have to escape the backslashes to get them past PHP's string processing:
'~\'(?:\\\\.|[^\'\\\\]+)*\'|"(?:\\\\.|[^"\\\\]+)*"~s'
大多数当前流行的语言都具有需要较少转义的字符串类型,对正则表达式文字的支持或两者兼而有之.这是您的正则表达式看起来像C#逐字字符串的方式:
Most of the currently-popular languages have either a string type that requires less escaping, support for regex literals, or both. Here's how your regex would look as a C# verbatim string:
@"'(?:\\.|[^'\\]+)*'|""(?:\\.|[^""\\]+)*"""
但是,除了格式方面的考虑之外,正则表达式本身也可以在任何Perl衍生的样式(以及许多其他样式)中使用.
But, formatting considerations aside, the regex itself should work in any Perl-derived flavor (and many other flavors as well).
p.s .:注意如何将
+
量词添加到您的字符类中.您一次匹配一个字符的直觉是正确的;添加+
会在性能上产生巨大差异.但是,不要让那个愚弄你.当您使用正则表达式时,直觉似乎常常是错误的. :/p.s.: Notice how I added the
+
quantifier to your character classes. Your intuition about matching one character at a time is correct; adding the+
makes a huge difference in performance. But don't let that fool you; when you're dealing with regexes, intuition seems to wrong more often than not. :/这篇关于将此PHP正则表达式用于多行字符串的意图转换为Python/PERL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- the s modifier means dot matches all characters, including newline; in Python that's