PHP和百万数组宝贝 [英] PHP and the million array baby

查看:140
本文介绍了PHP和百万数组宝贝的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有以下整数数组:

Imagine you have the following array of integers:

array(1, 2, 1, 0, 0, 1, 2, 4, 3, 2, [...] );

整数最多可容纳一百万个条目;它们只是经过预先生成并存储在JSON格式的文件中(大小约为2MB),而不是进行硬编码.这些整数的顺序很重要,我不能每次都随机生成它,因为它应该是一致的,并且在相同的索引处始终具有相同的值.

The integers go on up to one million entries; only instead of being hardcoded they've been pre-generated and stored in a JSON formatted file (of approximately 2MB in size). The order of these integers matters, I can't randomly generate it every time because it should be consistent and always have the same values at the same indexes.

如果此文件随后在PHP中被读回(例如,使用file_get_contents + json_decode),则只需700到900ms的时间即可将阵列取回.好吧,我想,从json_decode开始,这可能是合理的必须解析大约200万个字符,让我们对其进行缓存". APC将其缓存在大约68MB的条目中(可能是正常的),zval很大. 但是,从APC取回该阵列也需要花费600毫秒.在我看来,这仍然太多了.

If this file is read back in PHP afterwards (e.g. using file_get_contents + json_decode) it takes from 700 to 900ms just to get the array back — "Okay" I thought, "it's probably reasonable since json_decode has to parse about 2 million characters, let's cache it". APC caches it in a entry that takes about 68MB, probably normal, zvals are large. Retrieving however this array back from APC also takes some good 600ms which is in my eyes still way too much.

APC会进行序列化/反序列化来存储和检索具有一百万个项目数组的内容,这是一个漫长而繁重的过程.

所以问题:

  • 如果我打算在PHP中加载一百万个条目数组(无论是数据存储区还是方法),那么我会期望这种延迟吗?据我了解,APC存储zval本身,因此从理论上说,从APC检索zval应该尽可能快(不解析,不转换,不访问磁盘)

为什么APC这么简单的东西这么慢?

有没有一种有效的方法可以使用PHP将一百万个条目数组完全加载到内存中?假设RAM使用率不是问题.

Is there any efficient way to load a one million entries array entirely in memory using PHP? assuming RAM usage is not a problem.

如果我仅基于索引访问该数组的切片(例如,将数据块从索引15加载到索引76),而实际上并没有将整个数组存储在内存中(是的,我知道这是明智的方法)这样做,但我想了解所有方面),对于整个阵列,最有效的数据存储系统是什么?显然不是RDBM;我正在考虑Redis,但我很高兴听到其他想法.

If I were to access only slices of this array based on indexes (e.g. loading the chunk from index 15 to index 76) and never actually have the entire array in memory (yes, I understand this is the sane way of doing it, but I wanted to know all the sides), what would be the most efficient data store system for the complete array? Obviously not a RDBM; I'm thinking redis, but I would be happy to hear other ideas.

推荐答案

说整数都是0到15.然后您可以每个字节存储2个:

Say the integers are all 0-15. Then you can store 2 per byte:

<?php
$data = '';
for ($i = 0; $i < 500000; ++$i)
  $data .= chr(mt_rand(0, 255));

echo serialize($data);

要运行:php ints.php > ints.ser

现在您有一个文件,该文件包含500000字节的字符串,其中包含1,000,000个从0到15的随机整数.

Now you have a file with a 500000 byte string containing 1,000,000 random integers from 0 to 15.

要加载:

<?php
$data = unserialize(file_get_contents('ints.ser'));

function get_data_at($data, $i)
{
  $data = ord($data[$i >> 1]);

  return ($i & 1) ? $data & 0xf : $data >> 4;
}

for ($i = 0; $i < 1000; ++$i)
  echo get_data_at($data, $i), "\n";

我的计算机上的加载时间约为.002秒.

The loading time on my machine is about .002 seconds.

当然,这可能并不直接适用于您的情况,但是它比膨胀的一百万个条目的PHP数组要快得多.坦率地说,在PHP中拥有如此大的数组永远都不是正确的解决方案.

Of course this might not be directly applicable to your situation, but it will be much faster than a bloated PHP array of a million entries. Quite frankly, having an array that large in PHP is never the proper solution.

我也不是说这是正确的解决方案,但是如果它适合您的参数,那肯定是可行的.

I'm not saying this is the proper solution either, but it definitely is workable if it fits your parameters.

请注意,如果您的数组具有0-255范围内的整数,则可以摆脱包装,而仅以ord($data[$i])的形式访问数据.在这种情况下,您的字符串将为1M字节长.

Note that if your array had integers in the 0-255 range, you could get rid of the packing and just access the data as ord($data[$i]). In that case, your string would be 1M bytes long.

最后,根据file_get_contents()的文档,php将对文件进行内存映射.如果是这样,最好的性能是将原始字节转储到文件中,并像这样使用它:

Finally, according to the documentation of file_get_contents(), php will memory map the file. If so, your best performance would be to dump raw bytes to a file, and use it like:

$ints = file_get_contents('ints.raw');
echo ord($ints[25]);

这假定ints.raw恰好是一百万个字节长.

This assumes that ints.raw is exactly one million bytes long.

这篇关于PHP和百万数组宝贝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆