使用两个匹配时,preg_match 似乎达到了限制 [英] preg_match appears to hit a limit when using two matches

查看:38
本文介绍了使用两个匹配时,preg_match 似乎达到了限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个奇怪的问题.看来我在尝试使用 php-5.3.3 使用两个匹配项时,使用 preg_replace 达到了某种限制

I have run up against an odd problem. it appears i am reaching some sort of limit with preg_replace while trying to use two matches using php-5.3.3

// works fine
$pattern_1 = '?START(.*)STOP?';
$string = 'START' . str_repeat('x',9999999) . 'STOP' ;
preg_match($pattern_1, $string , $matchedArray )        ;

$pattern_2 = '?START-ONE(.*)STOP-ONE.*START-TWO(.*)STOP-TWO.*?';

// works fine
$string = 'START-ONE this is head stuff STOP-ONE  START-TWO' . str_repeat('x', 49970) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_2 )      ;

// didnt work
$string = 'START-ONE this is head stuff STOP-ONE  START-TWO' . str_repeat('x', 49971) . 'STOP-TWO' ;
preg_match($pattern_2, $string , $matchedArray_3 )      ;

只有一个匹配项的第一个选项使用了一个非常大的字符串并且没有问题.

The first option with only one match uses a very large string and has no problems.

第二个选项的字符串长度为 50,026,工作正常.最后一个选项的字符串长度为 50,027(多一个)并且匹配不再有效.由于发生错误时 49971 数字可能会有所不同,因此可以将其更改为更大的数字以模拟问题.

The second option has a string length of 50,026 and works fine. the last option has a string length of 50,027 (one more) and the match no longer works. since the 49971 number can vary when the error occurs, it could be changed to something larger to simulate the problem.

有什么想法或想法吗?也许这是一个php版本问题?也许一种可能的解决方法是仅使用一个匹配而不是两个,然后运行 ​​preg_match 两次?

Any ideas or thoughts? perhaps is this a php version issue? maybe a possible workaround is merely to only use one match rather than two and then run preg_match it twice ?

推荐答案

好吧,PHP 对正则表达式错误不是很健谈,它只返回 false 最后一种情况,这只是说明发生了错误,根据 PHP 文档.

Ok, PHP's not very talkative about regex errors, it just returns false for the last case, which simply tells than an error occured, per the PHP docs.

我在 C# 中使用 PCRE(preg_match 使用的正则表达式引擎)重现了这个问题(但字符数要高得多),我得到的错误是 PCRE_ERROR_MATCHLIMIT.

I've reproduced the problem using PCRE (the regex engine used by preg_match) in C# (but with a much higher character count), and the error I'm getting is PCRE_ERROR_MATCHLIMIT.

这意味着您达到了 PCRE 中设置的回溯限制.这只是防止引擎无限循环的安全措施,我认为您的 PHP 配置将其设置为较低的值.

This means you're hitting the backtracking limit set in PCRE. It's just a safety measure to prevent the engine from looping indefinitely, and I think your PHP configuration sets it to a low value.

要解决此问题,您可以为控制此限制的 pcre.backtrack_limit PHP 选项设置更高的值:

To fix the issue, you can set a higher value for the pcre.backtrack_limit PHP option which controls this limit:

ini_set("pcre.backtrack_limit", "10000000"); // Actually, this is PCRE's default

附注:

  • 您可能应该将 (.*) 替换为 (.*?) 以减少无用的回溯和正确性(否则正则表达式引擎将超过 STOP 字符串,并且必须回溯才能到达它)
  • 使用 ? 作为模式分隔符是一个想法,因为它会阻止您使用 ? 元字符并因此应用上述建议.确实,您应该永远使用正则表达式元字符作为模式分隔符.
  • You probably should replace (.*) with (.*?) to get less useless backtracking and for correctness (otherwise the regex engine will get past the STOP string and will have to backtrack to reach it)
  • Using ? as a pattern delimiter is a bad idea since it prevents you from using the ? metacharacter and therefore applying the above advice. Really, you should never use regex metacharacters as pattern delimiters.

如果您对更底层的细节感兴趣,这里是 PCRE 文档的相关部分(重点是我的):

If you're interested in more low-level details, here's the relevant bit of the PCRE docs (emphasis mine):

match_limit 字段提供了一种防止 PCRE 在运行不匹配但在其搜索树中具有大量可能性的模式时消耗大量资源的方法.经典示例是使用嵌套无限重复的模式.

The match_limit field provides a means of preventing PCRE from using up a vast amount of resources when running patterns that are not going to match, but which have a very large number of possibilities in their search trees. The classic example is a pattern that uses nested unlimited repeats.

在内部,pcre_exec() 使用一个名为 match() 的函数,它会重复调用(有时是递归调用).match_limit 设置的限制是在比赛期间调用此函数的次数,具有限制可以发生的回溯量的效果.对于未锚定的模式,对于主题字符串中的每个位置,计数从零重新开始.

Internally, pcre_exec() uses a function called match(), which it calls repeatedly (sometimes recursively). The limit set by match_limit is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.

pcre_exec() 使用 JIT 选项成功研究的模式调用时,执行匹配的方式完全不同.但是,仍然存在持续很长时间的失控匹配的可能性,因此在这种情况下也使用 match_limit 值(但以不同的方式)来限制匹配的时间可以继续.

When pcre_exec() is called with a pattern that was successfully studied with a JIT option, the way that the matching is executed is entirely different. However, there is still the possibility of runaway matching that goes on for a very long time, and so the match_limit value is also used in this case (but in a different way) to limit how long the matching can continue.

在构建PCRE时可以设置limit的默认值;默认默认值为 1000 万,它可以处理除最极端情况之外的所有情况.您可以通过为 pcre_exec() 提供一个 pcre_extra 块来覆盖默认值,其中设置了 match_limitPCRE_EXTRA_MATCH_LIMIT在标志字段中设置.如果超出限制,pcre_exec() 返回 PCRE_ERROR_MATCHLIMIT.

The default value for the limit can be set when PCRE is built; the default default is 10 million, which handles all but the most extreme cases. You can override the default by suppling pcre_exec() with a pcre_extra block in which match_limit is set, and PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.

匹配限制的值也可以由表单模式开头的项目提供

A value for the match limit may also be supplied by an item at the start of a pattern of the form

 (*LIMIT_MATCH=d)

其中 d 是十进制数.但是,除非 d 小于 pcre_exec() 调用者设置的限制,或者如果没有设置此类限制,则小于默认值,否则此类设置将被忽略.

where d is a decimal number. However, such a setting is ignored unless d is less than the limit set by the caller of pcre_exec() or, if no such limit is set, less than the default.

这篇关于使用两个匹配时,preg_match 似乎达到了限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆