“正则表达式太大". PHP中的错误 [英] "Regular Expression is too large" error in PHP

查看:345
本文介绍了“正则表达式太大". PHP中的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个相对复杂且非常大的正则表达式.它目前为41,127个字符,并且可能会因增加其他情况而有所增加.我开始在PHP中收到此错误:

I am working on a relatively complex, and very large regular expression. It is currently 41,127 characters, and may grow somewhat as additional cases may be added. I am starting to get this error in PHP:

preg_match_all():编译失败:正则表达式在偏移量41123处太大

preg_match_all(): Compilation failed: regular expression is too large at offset 41123

是否可以增加大小限制?其他地方建议的以下设置无效,因为这些设置适用于数据大小,而不适用于正则表达式大小:

Is there a way to increase the size limit? The following settings suggested elsewhere did NOT work because these apply to size of data and NOT the regex size:

ini_set("pcre.backtrack_limit", "100000000");
ini_set("pcre.recursion_limit", "100000000");

或者,有没有办法在正则表达式中定义一个可以在正则表达式中各个位置重复的子模式变量"?? (我不是在谈论使用*+的重复,甚至不是重复匹配的"1")?我实际上正在使用包含子模式的PHP变量,这些子模式在regex中的几个地方重复出现,但这会导致regex扩展到传递给PRCE函数之前.

Alternatively, is there a way to define a "sub-pattern variable" within the regex that could be repeated at various places within the regex? (I am not talking about repetition using * or +, or even repeating matched "1")? I am actually using PHP variables containing sub-Patterns that are repeated in few places within the regex, but this leads to expansion of the regex BEFORE it is passed on to PRCE functions.

这是一个复杂的正则表达式,不能使用strpos 我宁愿避免在| 上将其拆分为子表达式,并尝试分别匹配子表达式,因为大小的减小是适度的(只有2或3顶级|),这会使进一步的开发变得复杂.

I would prefer to avoid splitting this into sub-expressions at | and trying to match the sub-expressions separately, because the reduction in size would be modest (there are only 2 or 3 of top-level |), and this would complicate further development.

推荐答案

根据应用程序,有效的解决方案是:

Depending on the application, valid solutions are:

    通过对所有多余的子表达式使用DEFINE
  • 缩短正则表达式.
  • 通过重新编译PHP来增加正则表达式大小的最大限制(请参阅drew010的出色答案).尽管这可能并非在所有环境中都可用,或者在更换服务器时可能会造成兼容性问题.
  • | 处拆分正则表达式,然后分别处理结果子表达式.如果regex本质上是由|分隔的大量关键字,则转换为strtok或带有strpos的循环可能会更好.选择更快.
  • 使用其他语言/正则表达式引擎,例如C ++/Boost ,尽管我没有对此进行验证.
  • Shorten the Regular Expression by using DEFINE for any redundant sub-expressions (see below).
  • Increase the max limit on regex size by re-compiling PHP (see drew010's great answer). Although this may not be available in all environments or may create compatibility issues if changing servers.
  • Split your regular expression at | and process the resulting sub-expressions separately. If the regex is essentially numerous keywords separated by |, then converting to a strtok or a loop with strpos may be a better & faster choice.
  • Use other language / regex engine such as C++/Boost, although I did not verify this.

我特定问题的解决方案:根据Mario的评论,对于某些重复使用几次的子表达式,使用(?(DEFINE)...)构造将我的正则表达式大小从41,127个字符减少到了仅" 4,071,这是消除错误正则表达式太大"的绝佳解决方案.

Solution to my specific problem: As per Mario's comment, using the (?(DEFINE)...) construct for some of the sub-expressions that were re-used several times reduced my regex size from 41,127 characters down to "only" 4,071, and this was an elegant solution to get rid of the error "Regular Expression is too large."

请参见: (?(DEFINE)...) rexegg.com上的语法参考

这篇关于“正则表达式太大". PHP中的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆