找到PHP两个大型阵列之间的差异的最佳方法 [英] Best way to find differences between two large arrays in PHP

查看:141
本文介绍了找到PHP两个大型阵列之间的差异的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个非常大的阵列(〜2,500,000大小)。我需要找到这些阵列之间的差异。 通过不同的我的意思是我需要的合成数组与在阵列1,但不是在阵列2.我已经使用的和array_diff (),但它需要超过一个半小时!

I have 2 very large arrays (of size ~2,500,000). I need to find difference between these arrays. By difference I mean I need a resultant array with values that are in array 1 but not in array 2. I have used array_diff() but it takes more than half an hour!

的第一阵列从一个DB和第二阵列从另一分贝到来。他们不是同一个数据库服务器上。该阵列是相同的尺寸不。我处理的移动电话号码的数量庞大。我需要找出那些手机号码这是在一个列表中,但不是在另一个列表

The first array is coming from one DB and second array from another db. They aren't on the same Database server. The arrays aren't of same size. I am dealing with huge number of mobile numbers. I need to find out those mobile numbers which are in one list, but aren't in another list

数组是用​​数字键正常的阵列。差异code是如下:

arrays are normal arrays with numeric keys. Diff code is as follows:

$numbers_list = array_diff($numbers_list, $some_other_list);

是否有这样做的更好的办法?请大家帮帮忙。

Is there a better way of doing this? Please help.

推荐答案

这是一个简单的算法。

This is the simple algorithm.

  1. 翻转第一阵列。值将成为密钥。因此,重复值将被丢弃。
  2. 翻转第2个数组(可选)
  3. 为您在第2个数组的每个元素,如果它存在于第一个数组。
  1. Flip 1st array. Values will become keys. So repeated values will be discarded.
  2. Flip 2nd array (optional)
  3. Check for each element in 2nd array if it exists in 1st array.

由于您使用的是非常大的阵列,它会消耗大量的内存

As you are working with very large arrays, it'll consume a lot of memory.

下面是我的实现,

$a = file("l.a"); // l.a is a file contains 2,500,000 lines
$b = file("l.b");

function large_array_diff($b, $a){
    // Flipping 
    $at = array_flip($a);
    $bt = array_flip($b); 
    // checking
    $d = array_diff_key($bt, $at);

    return array_keys($d);   
}

我跑了它使用的 4G 内存限制。 3G也适用。刚刚测试。

I ran it using 4G memory limit. 3G also works. Just tested.

$ time php -d memory_limit=4G diff_la.php

大概花了<强11秒!

real    0m10.612s
user    0m8.940s
sys     0m1.460s

更新

随着code运行 2倍的速度比 large_array_diff 功能如上所述。

Following code runs 2x faster than large_array_diff function stated above.

function flip_isset_diff($b, $a) {
    $at = array_flip($a);
    $d = array();
    foreach ($b as $i)
        if (!isset($at[$i])) 
            $d[] = $i;

    return $d;
}

由于它不叫 array_flip (1次), array_diff_key array_keys 。大量的CPU周期被保存缘于此。

Because it does not call array_flip (1 time), array_diff_key and array_keys. Lots of CPU cycles are saved due to this.

这篇关于找到PHP两个大型阵列之间的差异的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆