preg_match 函数中的 RegExp 返回浏览器错误 [英] RegExp in preg_match function returning browser error

查看:30
本文介绍了preg_match 函数中的 RegExp 返回浏览器错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下函数破坏了我在 $pattern 变量中提供的正则表达式.如果我更改正则表达式我很好,所以我认为这就是问题所在.不过,我没有看到问题,即使它们已打开,我也没有收到标准的 PHP 错误.

function parseAPIResults($results){//从getAPIResults中获取结果,返回数组.$pattern = '/\[(.|\n)+\]/';$resultsArray = preg_match($pattern, $results, $matches);}

<块引用>

Firefox 6:连接已重置

Chrome 14:错误 101 (net::ERR_CONNECTION_RESET):连接是重置.

IE 8:Internet Explorer 无法显示网页

更新:
Apache/PHP 可能会崩溃.这是我运行脚本时的 Apache 错误日志:

<块引用>

[Sat Oct 01 11:41:40 2011] [notice] 父进程:子进程退出状态 255——正在重新启动.
[2011 年 10 月 1 日星期六 11:41:40] [通知]Apache/2.2.11 (Win32) PHP/5.3.0 配置——恢复正常操作

在 Windows 7 上运行 WAMP 2.0.

解决方案

简单的问题.复杂的答案!

是的,由于堆栈溢出,此类正则表达式会重复(并且静默地)导致 Apache/PHP 崩溃,并导致未处理的分段错误!

背景:

PHP preg_* 正则表达式系列使用强大的 PCRE 库菲利普·哈泽尔.在这个库中,有一类正则表达式需要对其内部 match() 函数进行大量递归调用,这会占用大量堆栈空间,(所使用的堆栈空间成正比)到匹配的主题字符串的大小).因此,如果主题字符串太长,则会发生堆栈溢出和相应的分段错误.此行为在 PCRE 文档 的末尾部分中进行了描述:pcrestack.

PHP 错误 1:PHP 集:pcre.recursion_limit 太大.

PCRE 文档描述了如何通过将递归深度限制为一个安全值来避免堆栈溢出分段错误,该值大致等于链接应用程序的堆栈大小除以 500.当递归深度按照建议适当限制时,库不会产生堆栈溢出,而是优雅地退出并显示错误代码.在 PHP 下,此最大递归深度由 pcre.recursion_limit 配置变量指定,并且(不幸的是)默认值设置为 100,000.这个值太大了!这里是pcre.recursion_limit的安全值表,适用于各种可执行堆栈大小:

Stacksize pcre.recursion_limit64 MB 13421732 MB 6710816 MB 335548 MB 167774 MB 83882 MB 41941 MB 2097512 KB 1048256 KB 524

因此,对于 Apache 网络服务器 (httpd.exe) 的 Win32 构建,其堆栈大小(相对较小)为 256KB,pcre.recursion_limit 的正确值code> 应设置为 524.这可以通过以下 PHP 代码行来完成:

ini_set("pcre.recursion_limit", "524");//PHP 默认值为 100,000.

将此代码添加到 PHP 脚本后,不会发生堆栈溢出,而是会生成有意义的错误代码.也就是说,它应该生成一个错误代码!(但不幸的是,由于另一个 PHP 错误,preg_match() 没有.)

PHP 错误 2:preg_match() 在出错时不返回 FALSE.

preg_match() 的 PHP 文档说它在出错时返回 FALSE.不幸的是,PHP 5.3.3 及以下版本有一个错误(#52732),其中preg_match() 不会在出错时返回 FALSE(而是返回 int(0),这与不匹配).此错误已在 PHP 5.3.4 版本中修复.

解决方案:

假设您将继续使用 WAMP 2.0(使用 PHP 5.3.0),解决方案需要考虑上述两个错误.以下是我的建议:

  • 需要将 pcre.recursion_limit 减少到一个安全值:524.
  • 每当 preg_match() 返回除 int(1) 以外的任何内容时,都需要明确检查 PCRE 错误.
  • 如果 preg_match() 返回 int(1),则匹配成功.
  • 如果 preg_match() 返回 int(0),则匹配要么不成功,要么出现错误.

这是您的脚本的修改版本(设计为从命令行运行),用于确定导致递归限制错误的主题字符串长度:

<?php//此测试脚本旨在从命令行运行.//它测量导致结果的主题字符串长度//preg_match() 函数中的 PREG_RECURSION_LIMIT_ERROR 错误.echo("进入TEST.PHP...\n");//设置并显示pcre.recursion_limit.(设置为堆栈大小/500).//在 Win32 下 httpd.exe 有一个堆栈 = 256KB 和 8MB 用于 php.exe.//ini_set("pcre.recursion_limit", "524");//堆栈大小 = 256KB.ini_set("pcre.recursion_limit", "16777");//堆栈大小 = 8MB.echo(sprintf("PCRE pcre.recursion_limit 设置为 %s\n",ini_get("pcre.recursion_limit")));函数 parseAPIResults($results){$pattern = "/\[(.|\n)+\]/";$resultsArray = preg_match($pattern, $results, $matches);如果($resultsArray === 1){$msg = '匹配成功.';} 别的 {//匹配失败,或发生 PCRE 错误.$pcre_err = preg_last_error();//PHP 5.2 及更高版本.如果($pcre_err === PREG_NO_ERROR){$msg = '不匹配成功.';} 别的 {//preg_match 错误!开关($pcre_err){案例 PREG_INTERNAL_ERROR:$msg = 'PREG_INTERNAL_ERROR';休息;案例 PREG_BACKTRACK_LIMIT_ERROR:$msg = 'PREG_BACKTRACK_LIMIT_ERROR';休息;案例 PREG_RECURSION_LIMIT_ERROR:$msg = 'PREG_RECURSION_LIMIT_ERROR';休息;案例 PREG_BAD_UTF8_ERROR:$msg = 'PREG_BAD_UTF8_ERROR';休息;案例 PREG_BAD_UTF8_OFFSET_ERROR:$msg = 'PREG_BAD_UTF8_OFFSET_ERROR';休息;默认:$msg = '无法识别的 PREG 错误';休息;}}}返回($ msg);}//构建一个大小增加的匹配测试字符串.函数 buildTestString() {静态 $content = "";$content .= "A";返回 '​​['.$内容.']';}//查找导致错误的主题字符串长度.for (;;) {//无限循环.爆发.$str = buildTestString();$msg = parseAPIResults($str);printf("长度=%10d\r", strlen($str));if ($msg !== '匹配成功.') break;}echo(sprintf("\nPCRE_ERROR = \"%s\" at subject string length = %d\n",$msg, strlen($str)));echo("退出 TEST.PHP...");?>

当您运行此脚本时,它会提供主题字符串当前长度的连续读数.如果 pcre.recursion_limit 的默认值太高,这允许您测量导致可执行文件崩溃的字符串长度.

评论:

  • 在调查这个问题的答案之前,我不知道在 PCRE 库中发生错误时 preg_match() 无法返回 FALSE 的 PHP 错误.这个错误肯定会引起很多使用 preg_match 的代码的问题!(我当然会清点我自己的 PHP 代码.)
  • 在 Windows 下,Apache 网络服务器可执行文件 (httpd.exe) 的堆栈大小为 256KB.PHP 命令行可执行文件 (php.exe) 的堆栈大小为 8MB.pcre.recursion_limit 的安全值应根据运行脚本的可执行文件进行设置(分别为 524 和 16777).
  • 在 *nix 系统下,Apache 网络服务器和命令行可执行文件通常都使用 8MB 的堆栈大小构建,因此不会经常遇到此问题.
  • PHP 开发人员应将 pcre.recursion_limit 的默认值设置为安全值.
  • PHP 开发人员应将 preg_match() 错误修复应用于 PHP 5.2 版.
  • 可以使用 CFF Explorer 免费软件程序手动修改 Windows 可执行文件的堆栈大小.您可以使用此程序来增加 Apache httpd.exe 可执行文件的堆栈大小.(这在 XP 下有效,但 Vista 和 Win7 可能会抱怨.)

The following function breaks with the regexp I've provided in the $pattern variable. If I change the regexp I'm fine, so I think that's the problem. I'm not seeing the problem, though, and I'm not receiving a standard PHP error even though they're turned on.

function parseAPIResults($results){
//Takes results from getAPIResults, returns array.

    $pattern = '/\[(.|\n)+\]/';
    $resultsArray = preg_match($pattern, $results, $matches);

}

Firefox 6: The connection was reset

Chrome 14: Error 101 (net::ERR_CONNECTION_RESET): The connection was reset.

IE 8: Internet Explorer cannot display the webpage

UPDATE:
Apache/PHP may be crashing. Here's the Apache error log from when I run the script:

[Sat Oct 01 11:41:40 2011] [notice] Parent: child process exited with status 255 -- Restarting.
[Sat Oct 01 11:41:40 2011] [notice] Apache/2.2.11 (Win32) PHP/5.3.0 configured -- resuming normal operations

Running WAMP 2.0 on Windows 7.

解决方案

Simple question. Complex answer!

Yes, this class of regex will repeatably (and silently) crash Apache/PHP with an unhandled segmentation fault due to a stack overflow!

Background:

The PHP preg_* family of regex functions use the powerful PCRE library by Philip Hazel. With this library, there is a certain class of regex which requires lots of recursive calls to its internal match() function and this uses up a lot of stack space, (and the stack space used is directly proportional to the size of the subject string being matched). Thus, if the subject string is too long, a stack overflow and corresponding segmentation fault will occur. This behavior is described in the PCRE documentation at the end under the section titled: pcrestack.

PHP Bug 1: PHP sets: pcre.recursion_limit too large.

The PCRE documentation describes how to avoid a stack overflow segmentation fault by limiting the recursion depth to a safe value roughly equal to the stack size of the linked application divided by 500. When the recursion depth is properly limited as recommended, the library does not generate a stack overflow and instead gracefully exits with an error code. Under PHP, this maximum recursion depth is specified with the pcre.recursion_limit configuration variable and (unfortunately) the default value is set to 100,000. This value is TOO BIG! Here is a table of safe values of pcre.recursion_limit for a variety of executable stack sizes:

Stacksize   pcre.recursion_limit
 64 MB      134217
 32 MB      67108
 16 MB      33554
  8 MB      16777
  4 MB      8388
  2 MB      4194
  1 MB      2097
512 KB      1048
256 KB      524

Thus, for the Win32 build of the Apache webserver (httpd.exe), which has a (relatively small) stack size of 256KB, the correct value of pcre.recursion_limit should be set to 524. This can be accomplished with the following line of PHP code:

ini_set("pcre.recursion_limit", "524"); // PHP default is 100,000.

When this code is added to the PHP script, the stack overflow does NOT occur, but instead generates a meaningful error code. That is, it SHOULD generate an error code! (But unfortunately, due to another PHP bug, preg_match() does not.)

PHP Bug 2: preg_match() does not return FALSE on error.

The PHP documentation for preg_match() says that it returns FALSE on error. Unfortunately, PHP versions 5.3.3 and below have a bug (#52732) where preg_match() does NOT return FALSE on error (it instead returns int(0), which is the same value returned in the case of a non-match). This bug was fixed in PHP version 5.3.4.

Solution:

Assuming you will continue using WAMP 2.0 (with PHP 5.3.0) the solution needs to take both of the above bugs into consideration. Here is what I would recommend:

  • Need to reduce pcre.recursion_limit to a safe value: 524.
  • Need to explicitly check for a PCRE error whenever preg_match() returns anything other than int(1).
  • If preg_match() returns int(1), then the match was successful.
  • If preg_match() returns int(0), then the match was either not successful, or there was an error.

Here is a modified version of your script (designed to be run from the command line) that determines the subject string length that results in the recursion limit error:

<?php
// This test script is designed to be run from the command line.
// It measures the subject string length that results in a
// PREG_RECURSION_LIMIT_ERROR error in the preg_match() function.

echo("Entering TEST.PHP...\n");

// Set and display pcre.recursion_limit. (set to stacksize / 500).
// Under Win32 httpd.exe has a stack = 256KB and 8MB for php.exe.
//ini_set("pcre.recursion_limit", "524");       // Stacksize = 256KB.
ini_set("pcre.recursion_limit", "16777");   // Stacksize = 8MB.
echo(sprintf("PCRE pcre.recursion_limit is set to %s\n",
    ini_get("pcre.recursion_limit")));

function parseAPIResults($results){
    $pattern = "/\[(.|\n)+\]/";
    $resultsArray = preg_match($pattern, $results, $matches);
    if ($resultsArray === 1) {
        $msg = 'Successful match.';
    } else {
        // Either an unsuccessful match, or a PCRE error occurred.
        $pcre_err = preg_last_error();  // PHP 5.2 and above.
        if ($pcre_err === PREG_NO_ERROR) {
            $msg = 'Successful non-match.';
        } else {
            // preg_match error!
            switch ($pcre_err) {
                case PREG_INTERNAL_ERROR:
                    $msg = 'PREG_INTERNAL_ERROR';
                    break;
                case PREG_BACKTRACK_LIMIT_ERROR:
                    $msg = 'PREG_BACKTRACK_LIMIT_ERROR';
                    break;
                case PREG_RECURSION_LIMIT_ERROR:
                    $msg = 'PREG_RECURSION_LIMIT_ERROR';
                    break;
                case PREG_BAD_UTF8_ERROR:
                    $msg = 'PREG_BAD_UTF8_ERROR';
                    break;
                case PREG_BAD_UTF8_OFFSET_ERROR:
                    $msg = 'PREG_BAD_UTF8_OFFSET_ERROR';
                    break;
                default:
                    $msg = 'Unrecognized PREG error';
                    break;
            }
        }
    }
    return($msg);
}

// Build a matching test string of increasing size.
function buildTestString() {
    static $content = "";
    $content .= "A";
    return '['. $content .']';
}

// Find subject string length that results in error.
for (;;) { // Infinite loop. Break out.
    $str = buildTestString();
    $msg = parseAPIResults($str);
    printf("Length =%10d\r", strlen($str));
    if ($msg !== 'Successful match.') break;
}

echo(sprintf("\nPCRE_ERROR = \"%s\" at subject string length = %d\n",
    $msg, strlen($str)));

echo("Exiting TEST.PHP...");

?>

When you run this script, it provides a continuous readout of the current length of the subject string. If the pcre.recursion_limit is left at its too high default value, this allows you to measure the length of string that causes the executable to crash.

Comments:

  • Before investigating the answer to this question, I didn't know about PHP bug where preg_match() fails to return FALSE when an error occurs in the PCRE library. This bug certainly calls into question a LOT of code that uses preg_match! (I'm certainly going to do an inventory of my own PHP code.)
  • Under Windows, the Apache webserver executable (httpd.exe) is built with a stacksize of 256KB. The PHP command line executable (php.exe) is built with a stacksize of 8MB. The safe value for pcre.recursion_limit should be set in accordance with the executable that the script is being run under (524 and 16777 respectively).
  • Under *nix systems, the Apache webserver and command line executables are both typically built with a stacksize of 8MB, so this problem is not encountered as often.
  • The PHP developers should set the default value of pcre.recursion_limit to a safe value.
  • The PHP developers should apply the preg_match() bugfix to PHP version 5.2.
  • The stacksize of a Windows executable can be manually modified using the CFF Explorer freeware program. You can use this program to increase the stacksize of the Apache httpd.exe executable. (This works under XP but Vista and Win7 might complain.)

这篇关于preg_match 函数中的 RegExp 返回浏览器错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆