将此PHP正则表达式用于多行字符串的意图转换为Python/PERL [英] Translate the intent of this PHP regex for multiline strings, into Python/PERL

查看:117
本文介绍了将此PHP正则表达式用于多行字符串的意图转换为Python/PERL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是一个PHP正则表达式,旨在匹配PHP或JavaScript源代码中的(多行)字符串(来自

Below is a PHP regex intended to match (multiline) strings inside PHP or JavaScript source code (from this post), but I suspect it's got issues. What is the literal Python (or else PERL) equivalent of this?

~'(\\.|[^'])*'|"(\\.|[^"])*"~s

  • s修饰符表示点匹配所有字符,包括换行符;在Python中re.compile(..., re.DOTALL)
  • 我完全不了解领先的\\.的意图吗?那会减少到.吗?双反斜杠是否需要在PHP中进行两次转义?
  • 允许每个位置匹配\\.[^'](任何不带引号的字符)对我来说似乎完全过头了,也许可以解释为什么此人的正则表达式会爆炸. [^']组是否已不匹配具有s修饰符的.的所有内容,确定它应该与换行符匹配?

    • the s modifier means dot matches all characters, including newline; in Python that's re.compile(..., re.DOTALL)
    • I totally don't get the intent of the leading \\. ? Does that reduce to . ? Are double-backslashes need to escape it twice in PHP?
    • allowing in every position a match of either \\. or [^'] (any non-quote character) seems total overkill to me, maybe explains why this person's regex blows up. Does [^'] group not already match everything that . with s modifier does, surely it should match newlines?

      用于在Python中用单引号和双引号构造两个版本的regex,可以使用此两步方法

      for constructing two versions of the regex with single, and double, quotes in Python, can use this two-step approach

      注意:此regex的较简单版本也可以在此 PHP列表中找到regex示例,在编程:字符串.

      NB a simpler version of this regex can also be found in this list of PHP regex examples, under Programming: String.

      推荐答案

      除不处理转义引号(即\"\')外,正则表达式通常还可以.修复起来很容易:

      The regex is mostly okay, except it doesn't handle escaped quotes (i.e., \" and \'). That's easy enough to fix:

      '(?:\\.|[^'\\]+)*'|"(?:\\.|[^"\\]+)*"
      

      这是一个通用"正则表达式;在Python中,您通常会以原始字符串的形式编写它:

      That's a "generic" regex; in Python you would usually write it in the form of a raw string:

      r"""'(?:\\.|[^'\\]+)*'|"(?:\\.|[^"\\]+)*""""
      

      在PHP中,您必须转义反斜杠以使它们通过PHP的字符串处理:

      In PHP you have to escape the backslashes to get them past PHP's string processing:

      '~\'(?:\\\\.|[^\'\\\\]+)*\'|"(?:\\\\.|[^"\\\\]+)*"~s'
      

      大多数当前流行的语言都具有需要较少转义的字符串类型,对正则表达式文字的支持或两者兼而有之.这是您的正则表达式看起来像C#逐字字符串的方式:

      Most of the currently-popular languages have either a string type that requires less escaping, support for regex literals, or both. Here's how your regex would look as a C# verbatim string:

      @"'(?:\\.|[^'\\]+)*'|""(?:\\.|[^""\\]+)*"""
      

      但是,除了格式方面的考虑之外,正则表达式本身也可以在任何Perl衍生的样式(以及许多其他样式)中使用.

      But, formatting considerations aside, the regex itself should work in any Perl-derived flavor (and many other flavors as well).

      p.s .:注意如何将+量词添加到您的字符类中.您一次匹配一个字符的直觉是正确的;添加+会在性能上产生巨大差异.但是,不要让那个愚弄你.当您使用正则表达式时,直觉似乎常常是错误的. :/

      p.s.: Notice how I added the + quantifier to your character classes. Your intuition about matching one character at a time is correct; adding the + makes a huge difference in performance. But don't let that fool you; when you're dealing with regexes, intuition seems to wrong more often than not. :/

      这篇关于将此PHP正则表达式用于多行字符串的意图转换为Python/PERL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆