PHP-为什么会警告我正则表达式太大? [英] PHP - Why am I being warned that my regular expression is too large?

查看：178 发布时间：2020/7/3 7:06:17 php regex

本文介绍了PHP-为什么会警告我正则表达式太大?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用正则表达式来验证用户输入.我想允许字母，数字，空格，逗号，撇号，句点，感叹号和问号的任何组合，但我也想将输入限制为4000个字符.我想出了以下正则表达式来实现这一目标:/^([a-z]|[0-9]| |,|'|\.|!|\?){1,4000}$/i.

I would like to use a regular expression to validate user input. I want to allow any combination of letters, numbers, spaces, commas, apostrophes, periods, exclamation marks, and question marks, but I also want to limit the input to 4000 characters. I have come up with the following regular expression to achieve this: /^([a-z]|[0-9]| |,|'|\.|!|\?){1,4000}$/i.

但是，当我尝试使用此正则表达式通过preg_match()在PHP中测试一个主题时，会收到警告:PHP Warning: preg_match(): Compilation failed: regular expression is too large at offset 37并且该主题无法测试.

However, when I attempt to use this regular expression test a subject in PHP with preg_match(), I am given a warning: PHP Warning: preg_match(): Compilation failed: regular expression is too large at offset 37 and the subject fails to be tested.

我发现这很奇怪，因为如果使用无限量词，则测试可以顺利通过(我在下面演示了这种情况).

I find this strange because if I use an infinite quantifier, the test passes just fine (I demonstrate this situation below).

为什么将重复次数限制为4000个问题，而无限次重复却没有呢?

regex-test.php:

<?php

$infinite = "/^([a-z]|[0-9]| |,|'|\.|!|\?)*$/i";        // Allows infinite repetition
$fourk    = "/^([a-z]|[0-9]| |,|'|\.|!|\?){1,4000}$/i"; // Limits repetition to 4000

$string   = "I like apples.";

if ( preg_match($infinite, $string) ){

    echo "Passed infinite repetition. \n";
}

if ( preg_match($fourk, $string) ){

    echo "Passed maximum repetition of 4000. \n";
}

?>

回声:

Passed infinite repetition 
PHP Warning:  preg_match(): Compilation failed: regular expression is too large at offset 37 in regex-test.php on line 16

LINK_SIZE

来自"处理超大图案 "rel =" nofollow noreferrer> pcrebuild手册页:

在已编译的模式中，偏移值用于指向一个部分到另一部分(例如，从左括号到交替元字符).默认情况下，在8位和16位库，将两个字节的值用于这些偏移量，从而导致编译后的图案的最大大小约为64K.

Within a compiled pattern, offset values are used to point from one part to another (for example, from an opening parenthesis to an alternation metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values are used for these offsets, leading to a maximum size for a compiled pattern of around 64K.

这意味着对于组中的每个重复，已编译的模式将为交替中的每个子模式存储一个偏移值.在这种情况下，偏移量不会在其余的已编译模式中留下任何内存.

That means the compiled pattern stores an offset value for every subpattern in the alternation, for every repetition of the group. In this case the offsets leave no memory for the rest of the compiled pattern.

在 pcre_internal.h (来自PHP发行版):

This is more clearly expressed in a comment in pcre_internal.h from the PHP dist:

PCRE会将已编译代码中的偏移量保持为2字节数量(始终) 默认情况下按大端顺序存储).例如，使用这些从子模式的开头链接到其替代项及其结尾.每个偏移量使用2个字节限制了编译后的大小 regex大约为64K，几乎可以容纳每个人.

PCRE keeps offsets in its compiled code as 2-byte quantities (always stored in big-endian order) by default. These are used, for example, to link from the start of a subpattern to its alternatives and its end. The use of 2 bytes per offset limits the size of the compiled regex to around 64K, which is big enough for almost everybody.

使用 pcretest ，得到以下信息:

Using pcretest, I get the following information:

PCRE version 8.37 2015-04-28 /^([a-z]|[0-9]| |,|'|\.|!|\?){1,575}$/i Failed: regular expression is too large at offset 36 /^([a-z]|[0-9]| |,|'|\.|!|\?){1,574}$/i Memory allocation (code space): 65432

您可以从 RexEgg.com 下载Windows版本./li>

There's a Windows version you can download from RexEgg.com.

关于PCRE中的其他大小限制，您可以查看我的帖子.

Regarding other size limitations in PCRE, you can check this post of mine.

如果我们确实有理由使用巨大的模式，并且不能完全简化此模式，则可以增加链接的大小.但是，您只能通过自己重新编译PHP来实现此目的(因此，从现在开始，您的代码将无法移植).如果没有其他选择，那应该是最后的选择.

If we had a true reason to use a huge pattern, and this pattern could not be simplified any further by all means, the link size could be increased. However, you can only achieve this by recompiling PHP yourself (therefore, your code won't be portable from now on). It should be the last resort, provided there's no other choice.

也在 pcre_internal.h :

Also commented in pcre_internal.h:

宏由LINK_SIZE的值控制. 在 config.h <中，该默认值为2/a>文件，但是可以通过在命令行上使用-D进行覆盖. 在Unix系统上，这是通过配置"命令自动完成的.

The macros are controlled by the value of LINK_SIZE. This defaults to 2 in the config.h file, but can be overridden by using -D on the command line. This is automated on Unix systems via the "configure" command.

PCRE链接大小可以配置为3或4:

PCRE link size can be configured to 3 or 4:

./configure -DLINK_SIZE=4

但是请记住，更长的偏移量需要更多数据，这会减慢对 preg_*函数.

But keep in mind that longer offsets require additional data, and it will slow down all calls to preg_* functions.

如果您自己编译PHP，请参见在Unix系统上安装或在Windows上构建自己的PHP .

In case of compiling PHP on your own, see Installation on Unix systems or Build your own PHP on Windows.

这篇关于PHP-为什么会警告我正则表达式太大?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PHP-为什么会警告我正则表达式太大? [英] PHP - Why am I being warned that my regular expression is too large?

问题描述

推荐答案

LINK_SIZE

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

PHP-为什么会警告我正则表达式太大? [英] PHP - Why am I being warned that my regular expression is too large?

问题描述

推荐答案

LINK_SIZE

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭