PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR,带有简单字符串 [英] PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR with simple string

查看:143
本文介绍了PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR,带有简单字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道其他人已针对此错误提交了问题,但是我看不到此正则表达式或主题字符串如何更简单.

I know other people have submitted questions around this error, however I can't see how this regex or the subject string could be any simpler.

对我来说,这是一个错误,但是在将其提交给PHP之前,我想我会确定并获得帮助,以查看这是否更简单.

To me this is a bug, but before submitting it to PHP I thought I'd make sure and get help to see if this can be simpler.

这是一个显示2个字符串的小测试脚本;一个带有1024 x,另一个带有1023:

Here's a small test script showing 2 strings; one with 1024 x's and one with 1023:

// 1024 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; 

// Outputs nothing (bug?)
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str); 

echo "\n\n";

// 1023 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; 

// Outputs the unchanged string as expected
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str);

如您所见,只有使用稍长的字符串(大于1024个字符),我们才会出现错误.此字符串将要处理的字符串将是任意长度的-它们将是论坛帖子,新闻文章等.

As you can see, only with a slightly longer string (greater than 1024 characters) do we get an error. The strings that will be processed by this are going to be any length – they will be forum posts, news articles, etc.

正则表达式说明

只是尝试进行一些markdown解析,以将像_I am italic_这样的字符串转换为我们在某些情况下从旧站点使用的旧版标记.原因/用途并不重要.重要的是,我认为这应该可以正常工作,实际上,除了PHP 7之外,它都可以在其他地方使用.

Just trying to do some markdown parsing to convert a string like _I am italic_, to a legacy version of markup we're using from our old site in certain situations. The reasons/uses aren't important. What's important is that I believe this should work just fine, and in fact it does, like, everywhere else except PHP 7.

仅当代表单个单词或句子的下划线时,才应与这些下划线匹配.如果它后面有任何基于单词"的字符,则不应与第一个下划线匹配;如果后面有任何基于单词"的字符,则其不应与最后一个下划线相匹配.

It should match these underscores only if that represent an independent word or sentence. It should not match the first underscore if it is preceded by any "word" based character, and it should not match the last underscore if it is followed by any "word" based character.

环境: Centos 7,PHP:7.1.6

Environment: Centos 7, PHP: 7.1.6

推荐答案

重要提示:
应避免使用(.|\n)*?(.|\r?\n)*?模式,因为它们会导致过多的冗余回溯.要匹配任何字符,通常可以将.与DOTALL标志一起使用,或者在JavaScript中,可以使用[^][\s\S]构造.有关更多详细信息,请参见如何在正则表达式的多行中匹配任何字符?.

IMPORTANT NOTE:
The (.|\n)*? or (.|\r?\n)*? patterns should be avoided as they cause too much redundant backtracking. To match any char, you usually may use . with a DOTALL flag, or, in JavaScript, you may use [^] or [\s\S] constructs. See How do I match any character across multiple lines in a regular expression? for more details.

当前问题

(.|\n(?!\n))*?模式的效率非常低,如果不在模式末尾使用(根本没有意义),则会导致大量冗余回溯.它位于图案左侧的位置越多,性能就越差.

The (.|\n(?!\n))*? pattern is very inefficient and causes a lot of redundant backtracking when used not at the end of the pattern (where it does not make sense at all). The more it is located to the left of the pattern, the worse is the performance.

由于它所做的全部工作是匹配任何字符,但只匹配一个换行符,然后匹配一个不带另一个换行符的换行符,因此可以将其重新编写为.*?(?:\R(?!\R).*?)*:

Since all it does is matches any char but a newline and then a newline that is not followed with another newline, in a lazy way, you may re-write the pattern as .*?(?:\R(?!\R).*?)*:

'~\b_([^_\n\t ].*?(?:\R(?!\R).*?)*)_\b~'

请参见 regex演示.

注意:

  • (?<=[^\w]|^) = \b,因为在后面看起来后面有一个_(一个字符char)
  • (?=[^\w]|$) = \b,因为前瞻前有一个_
  • .*?(?:\R(?!\R).*?)*-匹配:
    • .*?-除换行符以外的任何0+个字符,应尽可能少,然后
    • (?:\R(?!\R).*?)*-零个或多个序列:
      • \R(?!\R)-一个换行符序列,后面没有另一个换行符序列(\R = \n\r\n\r)
      • .*?-除换行符以外的任何0+个字符,并且尽可能少
      • (?<=[^\w]|^) = \b because there is a _ (a word char) after the lookbehind
      • (?=[^\w]|$) = \b because there is a _ before the lookahead
      • .*?(?:\R(?!\R).*?)* - matches:
        • .*? - any 0+ chars other than line break chars, as few as possible, then
        • (?:\R(?!\R).*?)* - zero or more sequences of:
          • \R(?!\R) - a line break sequence not followed with another line break sequence (\R = \n, \r\n or \r)
          • .*? - any 0+ chars other than line break chars, as few as possible

          这篇关于PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR,带有简单字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆