找到PHP两个大型阵列之间的差异的最佳方法 [英] Best way to find differences between two large arrays in PHP
问题描述
我有两个非常大的阵列(〜2,500,000大小)。我需要找到这些阵列之间的差异。 通过不同的我的意思是我需要的合成数组与在阵列1,但不是在阵列2.我已经使用的和array_diff (),但它需要超过一个半小时!
I have 2 very large arrays (of size ~2,500,000). I need to find difference between these arrays. By difference I mean I need a resultant array with values that are in array 1 but not in array 2. I have used array_diff() but it takes more than half an hour!
的第一阵列从一个DB和第二阵列从另一分贝到来。他们不是同一个数据库服务器上。该阵列是相同的尺寸不。我处理的移动电话号码的数量庞大。我需要找出那些手机号码这是在一个列表中,但不是在另一个列表
The first array is coming from one DB and second array from another db. They aren't on the same Database server. The arrays aren't of same size. I am dealing with huge number of mobile numbers. I need to find out those mobile numbers which are in one list, but aren't in another list
数组是用数字键正常的阵列。差异code是如下:
arrays are normal arrays with numeric keys. Diff code is as follows:
$numbers_list = array_diff($numbers_list, $some_other_list);
是否有这样做的更好的办法?请大家帮帮忙。
Is there a better way of doing this? Please help.
推荐答案
这是一个简单的算法。
This is the simple algorithm.
- 翻转第一阵列。值将成为密钥。因此,重复值将被丢弃。
- 翻转第2个数组(可选)
- 为您在第2个数组的每个元素,如果它存在于第一个数组。
- Flip 1st array. Values will become keys. So repeated values will be discarded.
- Flip 2nd array (optional)
- Check for each element in 2nd array if it exists in 1st array.
由于您使用的是非常大的阵列,它会消耗大量的内存。
As you are working with very large arrays, it'll consume a lot of memory.
下面是我的实现,
$a = file("l.a"); // l.a is a file contains 2,500,000 lines
$b = file("l.b");
function large_array_diff($b, $a){
// Flipping
$at = array_flip($a);
$bt = array_flip($b);
// checking
$d = array_diff_key($bt, $at);
return array_keys($d);
}
我跑了它使用的 4G
内存限制。 3G也适用。刚刚测试。
I ran it using 4G
memory limit. 3G also works. Just tested.
$ time php -d memory_limit=4G diff_la.php
大概花了<强11秒!
real 0m10.612s
user 0m8.940s
sys 0m1.460s
更新
随着code运行 2倍的速度比 large_array_diff
功能如上所述。
Following code runs 2x faster than large_array_diff
function stated above.
function flip_isset_diff($b, $a) {
$at = array_flip($a);
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
由于它不叫 array_flip
(1次), array_diff_key
和 array_keys
。大量的CPU周期被保存缘于此。
Because it does not call array_flip
(1 time), array_diff_key
and array_keys
. Lots of CPU cycles are saved due to this.
这篇关于找到PHP两个大型阵列之间的差异的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!