in_array()性能优化 [英] in_array() performance optimization

查看:260
本文介绍了in_array()性能优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下情况:

if(in_array($needle, $haystack) ||
    in_array($needle . "somePostfix", $haystack) ||
    in_array($needle . "someOtherPostfix", $haystack) ||
    // and some more) {
    // do something
}

我的干草堆包含超过1万个元素,此检查大约需要400毫秒.我知道in_array必须遍历整个数组多次.在我的情况下,常见的情况是找不到该元素.并且我尝试通过创建以下仅在干草堆上迭代一次的方法来改善这一点:

My haystack contains more than 10k elements, and this check takes about 400ms. I know that in_array has to iterate over the whole array multiple times. In my case the common case is that the element is not found. and I tried to improve this by creating the following method that only iterates once over the haystack:

function wildcardInArray($needle, $haystack) {
    foreach ($haystack as $value) {
        if (true === fnmatch($needle . '*', $haystack)) {
            return true;
        }
    }
    return false;
}

但这进一步降低了我的表现,在我看来fnmatch是瓶颈.

But this decreases my performance even more, seems to me that fnmatch is the bottleneck.

在这种情况下,数组搜索是否有改进?

Is there any improvement for this case of array search?

推荐答案

这是一个非常有趣的问题,似乎没有一个很好的答案.我做了一些非常不科学的基准测试,对于包含100000个元素的$haystack,我无法获得比in_array更快的速度.

This is a very interesting question that doesn't appear to have a great answer. I did some very unscientific bench-marking and I was not able to get any faster than in_array for a $haystack with 100000 elements.

PHP 5.5.9-1ubuntu4.14 (cli) (built: Oct 28 2015 01:34:46) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies
    with Xdebug v2.2.3, Copyright (c) 2002-2013, by Derick Rethans

Sorting Time*:    0.19367408752441
Imploding Time**: 0.0207359790802
preg_match:       0.10927486419678
needle ===:       0.083639144897461
in_array:         0.019428968429565
array_flip:       0.028955936431885
array_intersect:  0.15198707580566
array_diff:       0.15532493591309

//*sort without search (binary search wouldn't add much time)
//**time it took to implode the array 
//     (no search was performed, this search WOULD take significant time if implemented)

如您所见,needle ===in_arrayarray_flip中只有三种方法花费的时间少于100ms.在这三者中,in_array显然是最快的.现在的问题是,您有多少个postfix-es?在in_array上的运行时间将是O(n*m)(n是您的干草堆的大小,m是后缀的数量),如果m也很大,则是一个问题.如果m很大,则对数据进行一次排序并在排序后的列表上执行二进制搜索将是O(m*log(n)),它增长得慢得多,但具有较高的初始开销,如上面的排序时间所示.更好的是,如果您的m很大,则可能是array_flip,因为每次搜索在初始翻转后都只需要进行O(1)查找.

As you can see, only three of these methods took less than 100ms, needle ===, in_array and array_flip. And out of these three, in_array was clearly the fastest. Now the question is how many postfix-es do you have? The running time on in_array will be O(n*m) (n is size of your haystack, m is the number of postfixes), which is a problem if m is also very large. If m is significantly large, sorting the data once and performing a binary search on the sorted list will be O(m*log(n)) which grows much slower, but has a higher initial overhead as shown in the sorting time above. Even better, if you have a very large m would probably be array_flip as each search should only take O(1) lookup after the initial flip.

干草堆创建

$haystack = array();

function getRandomWord($len = 10) {
        $len = rand(3,10);
        $word = array_merge(range('a', 'z'), range('A', 'Z'));
            shuffle($word);
            return substr(implode($word), 0, $len);
}

$numWords = 100000;
for($i = 0; $i < $numWords; $i++) {
    $haystack[] = getRandomWord();
}

测试

//*Sorting*    
$copy = $haystack;
sort($copy);


//implode    
$copy = implode($haystack, " ");


//*preg_match_test*
function preg_match_test($regex, $haystack) {
    $matches = false;
    foreach($haystack as $value) {
        if (preg_match($regex, $value)) {
            $matches = true;
            break;
        }
    }
    return $matches;
}

//needle ===
function equalsNeedle($needles, $haystack) {
    $matches = false;
    foreach ($haystack as $value) {
        foreach($needles as $needle) {
            if ($needle === $value) {
                $matches = true;
                break 2;
            }
        }
    }
    return $matches;
}

//in_array
function baseCase($needles, $haystack) {
    $matches = false;
    foreach($needles as $needle) {
        if (in_array($needle, $haystack)) {
            $matches = true;
            break;
        }
    }
    return $matches;
}

//array_flip
function arrayFlipping($needles, $haystack) {
    $matches = false;
    $copy = array_flip($haystack);
    foreach ($needles as $needle) {
        if (array_key_exists($needle, $copy)) {
            $matches = true;
        }
    }
    return $matches;
}

//array_intersect
function arrayIntersect($needles, $haystack) {
    if (count(array_intersect($needles, $haystack)) > 0) {
        return true;
    }
    return false;
}

//array_diff
function arrayDiff($needles, $haystack) {
    if (count(array_diff($needles, $haystack)) !== count($needles)) {
        return true;
    }
    return false;
}

呼叫代码

$array = array("foo","foobar","foobazz","foobuzz");
$base = "foo";
$regex = "/^$base(bizz|bazz|buzz|)$/";

echo "preg_match: ";
preg_match_test($regex, $haystack);
echo "needle === ";
equalsNeedle($array, $haystack);
echo "in_array:  ";
baseCase($array, $haystack);
echo "array_flip:  ";
arrayFlipping($array, $haystack);
echo "array_intersect:  ";
arrayIntersect($array, $haystack);
echo "array_diff:  ";
arrayDiff($array, $haystack);

所有测试都使用microtime(true)用定时代码进行包装.

All tests were wrapped with timing code using microtime(true).

这篇关于in_array()性能优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆