PHP PREG_JIT_STACKLIMIT_ERROR-低效的正则表达式 [英] PHP PREG_JIT_STACKLIMIT_ERROR - inefficient regex

查看:99
本文介绍了PHP PREG_JIT_STACKLIMIT_ERROR-低效的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用更长的字符串时,在preg_replace_callback()函数中出现PREG_JIT_STACKLIMIT_ERROR错误.超过2000个字符时不会唤醒(与正则表达式匹配的2000个字符以上,而不是2000个字符串).
我已经读过它是由低效的正则表达式引起的,但是我不能简化我的正则表达式.这是我的正则表达式:

I am getting PREG_JIT_STACKLIMIT_ERROR error in preg_replace_callback() function when working with a bit longer string. Above 2000 characters it is not woking (above 2000 characters that match regex, not 2000 character string).
I've read already that it's caused by inefficient regex, but I can't make my regex simpler. Here's my regex:

/\{@([a-z0-9_]+)-((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\|(((?R)|.)*)@\}/Us

它应该匹配以下字符串:

It should match strings like these:

1){@if-statement|echo this|echo otherwise@}

2){@if-statement:sub|echo this|echo otherwise@}

3){@if-statement%statament2:sub|echo this@}

,并像这样嵌套:

4){@if-statement|echo this| {@if-statement2|echo this|echo otherwise@} @}

4) {@if-statement|echo this| {@if-statement2|echo this|echo otherwise@} @}

我试图将其简化为:

/\{@([a-z0-9_]+)-([a-z0-9_]+)\|(((?R)|.)*)@\}/Us

但是看来错误是由(((?R)|.)*)部分引起的.有什么建议吗?

But it looks like error is caused by (((?R)|.)*) part. Any advice?

测试代码:

$string = '{@if-is_not_logged_homepage|
<header id="header_home">
    <div class="in">
        <div class="top">
            <h1 class="logo"><a href="/"><img src="/img/logo-home.png" alt=""></a></h1>
            <div class="login_outer_wrapper">
                <button id="login"><div class="a"><i class="stripe"><i></i></i>Log in</div></button>
                <div id="login_wrapper">
                    <form method="post" action="{^login^}" id="form_login_global">
                        <div class="form_field no_description">
                            <label>{!auth:login_email!}</label>
                            <div class="input"><input type="text" name="form[login]"></div>
                        </div>
                        <div class="form_field no_description password">
                            <label>{!auth:password!}</label>
                            <div class="input"><input type="password" name="form[password]"></div>
                        </div>
                        <div class="remember">
                            <input type="checkbox" name="remember" id="remember_me_check" checked>
                            <label for="remember_me_check"><i class="fa fa-check" aria-hidden="true"></i>Remember</label>
                        </div>
                        <div class="submit_box">
                            <button class="btn btn_check">Log in</button>
                        </div>
                    </form>
                </div>
            </div>
        </div>
        <div class="content clr">
            <div class="main_menu">
                <a href="">
                    <i class="ico a"><i class="fa fa-lightbulb-o" aria-hidden="true"></i></i>
                    <span>Idea</span>
                    <div>&nbsp;</div>
                </a>
                <a href="">
                    <i class="ico b"><i class="fa fa-user" aria-hidden="true"></i></i>
                    <span>FFa</span>
                </a>
                <a href="">
                    <i class="ico c"><i class="fa fa-briefcase" aria-hidden="true"></i></i>
                    <span>Buss</span>
                </a>
            </div>
            <div class="text_wrapper">

                <div>
                    <div class="register_wrapper">
                        <a id="main_register" class="btn register">Załóż konto</a>
                        <form method="post" action="{^login^}" id="form_register_home">
                            <div class="form_field no_description">
                                <label>{!auth:email!}</label>
                                <div class="input"><input type="text" name="form2[email]"></div>
                            </div>
                            <div class="form_field no_description password">
                                <label>{!auth:password!}</label>
                                <div class="input tooltip"><input type="password" name="form2[password]"><i class="fa fa-info-circle tooltip_open" aria-hidden="true" title="{!auth:password_format!}"></i></div>

                            </div>
                            <div class="form_field terms no_description">
                                <div class="input">
                                    <input type="checkbox" name="form2[terms]" id="terms_check">
                                    <label for="terms_check"><i class="fa fa-check" aria-hidden="true"></i>Agree</label>
                                </div>
                            </div>
                            <div class="form_field no_description">
                                <div class="input captcha_wrapper">
                                    <div class="g-recaptcha" data-sitekey="{%captcha_public_key%}"></div>
                                </div>
                            </div>
                            <div class="submit_box">
                                <button class="btn btn_check">{!auth:register_btn!}</button>
                            </div>
                        </form>
                    </div>
                </div>
            </div>
        </div>
    </div>
</header>
@}';

$if_counter = 0;

$parsed_view = preg_replace_callback( '/\{@([a-z0-9_]+)-((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\|(((?R)|.)*)@\}/Us',
        function( $match ) use( &$if_counter ){
            return '<-{'. ( $if_counter ++ ) .'}->';
        }, $string );


var_dump($parsed_view); // NULL

推荐答案

什么是 PCRE JIT ?

及时编译是一项重量级的优化,可以极大地提高 加快模式匹配.但是,这要付出额外的代价 进行匹配之前的处理. 因此,这是大多数 相同的模式将要多次匹配时会受益.

Just-in-time compiling is a heavyweight optimization that can greatly speed up pattern matching. However, it comes at the cost of extra processing before the match is performed. Therefore, it is of most benefit when the same pattern is going to be matched many times.

它基本如何工作?

PCRE(和JIT)是一种递归的,深度优先的引擎,因此它需要一个堆栈 在检查当前节点的本地数据之前将其推送到何处 子节点...运行已编译的JIT代码时,它需要一个 用作堆栈的内存.默认情况下,它在计算机上使用 32K 堆.但是,某些大型或复杂模式需要的不只是 这.错误PCRE_ERROR_JIT_STACKLIMIT在没有的情况下给出 足够的堆栈.

PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where the local data of the current node is pushed before checking its child nodes... When the compiled JIT code runs, it needs a block of memory to use as a stack. By default, it uses 32K on the machine stack. However, some large or complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT is given when there is not enough stack.

通过第一引号,您将了解JIT是PHP [v7.*] PCRE中默认启用的一项可选功能.因此,您可以轻松地将其关闭:pcre.jit = 0(尽管不建议这样做)

By first quote you will understand JIT is an optional feature that is on by default in PHP [v7.*] PCRE. So you can easily turn it off: pcre.jit = 0 (it's not recommended though)

但是,当接收到preg_*函数的错误代码#6时,这意味着JIT可能达到了堆栈大小限制.

However, while receiving error code #6 of preg_* functions it means possibly JIT hits the stack size limit.

由于捕获组比非捕获组消耗更多的内存(按群集量词的类型,打算使用更多的内存):

Since capturing groups consume more memory than non-capturing groups (even more memory is intended to be used as per type of quantifier(s) of clusters):

  1. 捕获组OP_CBRA( pcre_jit_compile.c:#1138 )-(实际内存远不止于此):
  1. Capturing group OP_CBRA (pcre_jit_compile.c:#1138) - (real memory is more than this):

case OP_CBRA:
case OP_SCBRA:
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;

  1. 非捕获组OP_BRA(
  1. Non-capturing group OP_BRA (pcre_jit_compile.c:#1134) - (real memory is more than this):

case OP_BRA:
bracketlen = 1 + LINK_SIZE;
break;

因此,在您自己的RegEx中将捕获组更改为非捕获组将使其提供适当的输出(我不知道该存储多少存储量)

Therefore changing capturing groups to non-capturing groups in your own RegEx makes it to give proper output (which I don't know exactly how much memory is saved by that)

但是似乎您需要捕获组,它们是必需的.然后,为了性能起见,您应该重新编写RegEx.回溯几乎是应考虑的RegEx中的所有内容.

But it seems you need capturing groups and they are necessary. Then you should re-write your RegEx for the sake of performance. Backtracking is almost everything in a RegEx that should be considered.

解决方案:

(?(DEFINE)
  (?<recurs>
    (?! {@|@} ) [^|] [^{@|\\]* ( \\.[^{@|\\]* )* | (?R)
  )
)
{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}

实时演示

Live demo

PHP代码(观看反斜杠转义):

PHP code (watch backslash escaping):

preg_match_all('/(?(DEFINE)
  (?<recurs>
    (?! {@|@} ) [^|] [^{@|\\\\]* ( \\\\.[^{@|\\\\]* )* | (?R)
  )
)
{@
(?<If> \w+ )-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}/x', $string, $matches);

这是您自己的RegEx,它以最少的回溯步骤进行了优化.因此,应该由您自己匹配的任何东西也与此匹配.

This is your own RegEx that is optimized in a way to have least backtracking steps. So whatever was supposed to be matched by your own one is matched by this too.

RegEx,不包含嵌套的if块:

RegEx without following nested if blocks:

{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^|\\]* (?: \\.[^|\\]* )* )
(?<False> [|] \X*)?
@}

实时演示

大多数量词都是通过附加+来占有性地写的(避免回溯).

Most of quantifiers are written possessively (avoids backtrack) by appending + to them.

这篇关于PHP PREG_JIT_STACKLIMIT_ERROR-低效的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆