PHP-将2d数组转换为按特定值分组的3d数组的最快方法 [英] PHP - Fastest way to convert a 2d array into a 3d array that is grouped by a specific value

查看:79
本文介绍了PHP-将2d数组转换为按特定值分组的3d数组的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想转换记录的二维数组:

I would like to convert this two dimensional array of records:

[records] => Array
(
  [0] => Array
  (
    [0] => Pears
    [1] => Green
    [2] => Box
    [3] => 20
  )
  [1] => Array
  (
    [0] => Pears
    [1] => Yellow
    [2] => Packet
    [3] => 4
  )
  [2] => Array
  (
    [0] => Peaches
    [1] => Orange
    [2] => Packet
    [3] => 4
  )
  [3] => Array
  (
    [0] => Apples
    [1] => Red
    [2] => Box
    [3] => 20
  )
)

进入此三维数组,其中每个数组键均按原始数组中的某个值进行分组:

Into this three dimensional array where each array key is grouped by a certain value from the original array:

[converted_records] => Array
(
  [Pears] => Array
  (
    [0] => Array
    (
      [0] => Green
      [1] => Box
      [2] => 20
    )
    [1] => Array
    (
      [0] => Yellow
      [1] => Packet
      [2] => 4
    )
  )
  [Peaches] => Array
  (
    [0] => Array
    (
      [0] => Orange
      [1] => Packet
      [2] => 4
    )
  )
  [Apples] => Array
  (
    [0] => Array
    (
      [0] => Red
      [1] => Box
      [2] => 20
    )
  )
)

我可以这样做:

$array = // Sample data like the first array above
$storage = array();
$cnt = 0;
foreach ($array as $key=>$values) {
  $storage[$values[0]][$cnt] = array (
    0 => $values[1],
    1 => $values[2],
    2 => $values[3]
  );
  $cnt ++;
}

我想知道是否有更理想的方法来做到这一点.我不知道PHP中的任何功能都可以做到这一点,所以我只能假设这基本上是可以完成的.

I wanted to know if there is a more optimal way to do this. I am not aware of any functions within PHP that are capable of this so I can only assume that this is basically how it would be done.

但是问题是,这将被重复很多次,并且每毫秒都会计数,所以我真的想知道完成这项任务的最佳方法是什么?

The problem is though, this is going to be repeated so so many times and every little millisecond is going to count so I really want to know what is the best way to accomplish this task?

编辑

通过如下解析.CSV文件来创建records数组:

The records array is created by parsing a .CSV file as follows:

$records = array_map('str_getcsv', file('file.csv'));

编辑#2

我对一组10个结果(每个5k记录)进行了简单的基准测试,平均运行时间为0.645478秒.当然,在此之前还有其他事情在进行,因此这并不是实际性能的真实指示,而是与其他方法进行比较的良好指示.

I did a simple benchmark test on a set of 10 results (5k records each) to get an average runtime of 0.645478 seconds. Granted there is a few other things going on before this so this is not a true indication of actual performance but a good indication for comparison to other methods.

编辑#3

我用大约20倍的记录进行了测试.我的日常平均水平为14.91971.

I did a test with about 20x the records. The average of my routine was 14.91971.

@ num8er在下面的答案在某些情况下具有$records[$key][] = array_shift($data);,而现在仍未更新.

At some point the answer below by @num8er had $records[$key][] = array_shift($data); before updating the answer as it is now.

当我尝试使用更大的结果集进行测试时,它耗尽了内存,因为它为每个记录生成了一个错误.

When I tried testing with the larger set of results this it ran out of memory as its generating an error for each record.

这就是说,一旦我做了$records[$key][] = $data;,例程便平均以18.03699秒的时间结束,并注释了gc_collect_cycles().

This being said, once i did $records[$key][] = $data; the routine completed with an average of 18.03699 seconds with gc_collect_cycles() commented out.

我得出的结论是,尽管对于较小的文件,@ num8ers方法更快,但是对于较大的文件,我的方法更快.

I've reached the conclusion that although @num8ers method is faster for smaller files, for larger ones my method works out quicker.

推荐答案

使用file()将大文件读取到内存中(读取文件时为第一次迭代)
然后使用array_map迭代各行(将文件的每一行读入数组后的第二次迭代)
在数组上执行foreach(第3次迭代)
寻找性能时这是个坏主意.

您要迭代3次.那么10万条记录呢?会迭代30万次吗?
最有效的方法是在读取文件时执行此操作.只有1次迭代-读取行(100K条记录== 100K迭代):

reading big file to memory using file() (1st iteration when it reads file)
and then iterating lines using array_map (2nd iteration after each line of file is read to array)
doing foreach on array (3rd iteration)
it is bad idea when You're looking for performance.

You're iterating 3 times. so what about 100K records? it will iterate 300K times?
most performant way is to do it while reading file. there is only 1 iteration - reading lines (100K records == 100K iteration):

ini_set('memory_limit', '1024M');
set_time_limit(0);

$file = 'file.csv';
$file = fopen($file, 'r');

$records = array();
while($data = fgetcsv($file)) {
  $key = $data[0];
  if(!isset($records[$key])) {
    $records[$key] = array();
  }

  $records[$key][] = array(0 => $data[1],
                           1 => $data[2],
                           2 => $data[3]);
  gc_collect_cycles();
}

fclose($file);


这里是父级->子级处理大文件:


and here is parent -> children processing for huge files:

<?php

ini_set('memory_limit', '1024M');
set_time_limit(0);

function child_main($file)
{
    $my_pid = getmypid();
    print "Starting child pid: $my_pid\n";

    /**
     * OUR ROUTINE
     */

    $file = fopen($file, 'r');
    $records = array();
    while($data = fgetcsv($file)) {
        $key = $data[0];
        if(!isset($records[$key])) {
            $records[$key] = array();
        }

        $records[$key][] = array(0 => $data[1],
            1 => $data[2],
            2 => $data[3]);
        gc_collect_cycles();
    }
    fclose($file);

    unlink($file);

    return 1;
}


$file = __DIR__."/file.csv";
$files = glob(__DIR__.'/part_*');
if(sizeof($files)==0) {
    exec('split -l 1000 '.$file.' part_'); 
    $files = glob(__DIR__.'/part_*');
}

$children = array();
foreach($files AS $file) {
    if(($pid = pcntl_fork()) == 0) {
        exit(child_main($file));
    }
    else {
        $children[] = $pid;
    }
}

foreach($children as $pid) {
    $pid = pcntl_wait($status);
    if(pcntl_wifexited($status)) {
        $code = pcntl_wexitstatus($status);
        print "pid $pid returned exit code: $code\n";
    }
    else {
        print "$pid was unnaturally terminated\n";
    }
}

?>

这篇关于PHP-将2d数组转换为按特定值分组的3d数组的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆