Preg_replace的效率 [英] Efficiency of Preg_replace

查看:73
本文介绍了Preg_replace的效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

preg_replace()的运行速度比字符串比较快.为什么?正则表达式不应该慢一些吗?

preg_replace() ran faster than string comparisons. Why? Shouldn't regular expressions be slower?

最近的问题中,有关检测任何给定输入中不允许的子字符串数组,我建议将 preg_replace()调用的结果与原始输入进行比较,因为 preg_replace()可以将模式数组作为输入.因此,我的解决方法可能是单个 if ,而其他解决方案则需要一个(或多个)循环.

In a recent question about detecting any of an array of disallowed substrings within a given input, I suggested comparing the result of a preg_replace() call to the original input, since preg_replace() can take an array of patterns as input. Thus my method for this could be a single if whereas the other solutions required one (or many) loops.

我对辩论的答案不感兴趣,因为与循环相比,它的可读性/可维护性差.我的答案仍然是-1,为了易于阅读/易于维护,我会接受它,但是我的方法指出的最大错误是效率低下.那使我感到好奇,并导致我进行了一些测试.我的结果令我有些惊讶:在所有其他因素保持不变的情况下, preg_replace()比其他任何方法都快.

I'm not interested in debating my answer, because really it is less readable/maintainable than the loops. My answer there still holds a -1, and I'll accept that for readability/ease of maintenance, but the biggest fault pointed out with my method was a lack of efficiency. That got me curious, and led me to do some testing. My results were a bit surprising to me: with all other factors held equal, preg_replace() was faster than any of the other methods.

您能解释为什么会这样吗?

这些测试的代码以及结果如下:

My code for these tests can be found below, along with the results:

$input = "In a recent question about detecting any of an array of disallowed substrings within a given input, I suggested comparing the result of a `preg_replace()` call to the original input, since `preg_replace()` can take an array of patterns as input. Thus my method for this could be a single `if` whereas the other solutions required one (or many) loops. I'm not interested in debating my answer, because really it is less readable/maintainable than the loops. However, the biggest fault pointed out with my method was a lack of efficiency. That got me curious, and led me to do some testing. My results were a bit surprising to me: with all other factors held equal, `preg_replace()` was **faster** than any of the other methods. Can you explain why this was the case?";
$input2 = "Short sentence - no matches";
$input3 = "Word";
$input4 = "Short sentence - matches loop";

$start1 = microtime(true);
$rejectedStrs = array("loop", "efficiency", "explain");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (str_check($rejectedStrs, $input)) $p_matches++;
    if (str_check($rejectedStrs, $input2)) $p_matches++;
    if (str_check($rejectedStrs, $input3)) $p_matches++;
    if (str_check($rejectedStrs, $input4)) $p_matches++;
}

$start2 = microtime(true);
$rejectedStrs = array("loop", "efficiency", "explain");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (loop_check($rejectedStrs, $input)) $l_matches++;
    if (loop_check($rejectedStrs, $input2)) $l_matches++;
    if (loop_check($rejectedStrs, $input3)) $l_matches++;
    if (loop_check($rejectedStrs, $input4)) $l_matches++;
}

$start3 = microtime(true);
$rejectedStrs = array("/loop/", "/efficiency/", "/explain/");
$s_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (preg_check($rejectedStrs, $input)) $s_matches++;
    if (preg_check($rejectedStrs, $input2)) $s_matches++;
    if (preg_check($rejectedStrs, $input3)) $s_matches++;
    if (preg_check($rejectedStrs, $input4)) $s_matches++;
}

$end = microtime(true);
echo $p_matches." ".$l_matches." ".$s_matches."\n";
echo "str_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$start3."=".($start3-$start2)."\npreg_match: ".$start3." ".$end."=".($end-$start3);

function preg_check($rejectedStrs, $input) {
    if($input == preg_replace($rejectedStrs, "", $input)) 
        return true;
    return false;
}

function loop_check($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return false;
            }
        }
    }
    return true;
}

function str_check($badwords, $str) {
    foreach ($badwords as $word) {
        if (stripos(" $str ", " $word ") !== false) {
            return false;
        }
    }
    return true;
}

结果

20000 20000 20000

20000 20000 20000

str_match:1282270516.6934 1282270518.5881 = 1.894730091095

str_match: 1282270516.6934 1282270518.5881= 1.894730091095

loop_match:1282270518.5881 1282270523.0943 = 4.5061857700348

loop_match: 1282270518.5881 1282270523.0943=4.5061857700348

preg_match:1282270523.0943 1282270523.6191 = 0.52475500106812

preg_match: 1282270523.0943 1282270523.6191=0.52475500106812

推荐答案

让我们首先看一下 preg_check loop_check .他们两个都将必须遍历整个字符串,并且他们将必须检查每次遍历中的每个单词.因此,他们的行为至少会是 O(n * m),其中 n 是字符串的长度,而 m 是坏词的数量.您可以通过增加 n m 的值运行算法并绘制3D图形来测试它(但是,您可能会或可能不一定必须使用 n m 的高值以查看此行为).

Let's first look at preg_check and loop_check. Both of them will have to traverse the entire string, and they will have to check each of the individual words in each traversal. So their behavior will at least be O(n*m), where n is the length of the string and m the number of bad words. You can test this by running the algorithm with increasing values of n and m and plotting the 3D graphs (however, you may, or may not, have to run it with very high values of n and m to see this behavior).

loop_check 在这里(渐近地)高效(渐近).原因是字符串中的单词数量与它们的长度不成比例-我似乎记得它通常遵循对数函数.它可能使用哈希表来存储通过这种方式找到的单词,这是在平均恒定时间内完成的(如果我们忽略了可能不得不不时地重建哈希表以容纳更多元素的情况).

loop_check is more (asymptoticly) efficient here. The reason is that the number of words a string has is not proportional to their length -- I seem to recall it typically follows a logarithmic function. It probably uses a hash table to store the words it finds through the way, which is done in average constant time (if we ignore that we may have to rebuild the hash table from time to time to accommodate more elements).

因此, loop_check 将具有如下渐近行为,类似于 n + m * log(n),这比 n * m 好>.

Therefore loop_check will have an asymptotic behavior that follows something like n + m * log(n), which is better than n*m.

现在,这是指算法的渐近行为,即,当 m n 很大时(可能需要非常非常")增长.对于 m n 的较小值,常量起很大作用.特别是,PHP操作码和PHP函数调用的执行比用C实现的同一任务(仅一个函数调用)的开销更大.这并不能使regex算法更快,而只是使 m n 的较小值更快.

Now, this refers to the asymptotic behavior of the algorithms, i.e., when m and n grow very (and it may require "very very") large. For small values of m and n the constants play a big part. In particular, execution of PHP opcodes and PHP function calls are more costly than the same task implemented in C, just one function call away. This doesn't make the regex algorithm faster, it just makes it faster for small values of m and n.

这篇关于Preg_replace的效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆