PHP 内存实际上是如何工作的 [英] How does PHP memory actually work

查看:41
本文介绍了PHP 内存实际上是如何工作的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直听说并搜索新的 php '良好的写作习惯',例如:检查数组键是否存在比在数组中搜索更好(为了性能),但对于内存来说似乎也更好:

假设我们有:

$array = 数组('一个' =>1、'两个' =>2、'三' =>3、'四' =>4、);

这分配了 1040 字节的内存,

$array = 数组(1 =>'一',2 =>'二',3 =>'三',4 =>'四',);

需要 1136 字节

我知道 keyvalue 肯定会有不同的存储机制,但是请你能告诉我它是如何工作的原理吗?

示例 2 (对于@teuneboon):

$array = 数组('一个' =>'1','两个' =>'2','三' =>'3','四' =>'4',);

1168 字节

$array = 数组('1' =>'一','2' =>'二','3' =>'三','4' =>'四',);

1136 字节

消耗相同的内存:

  • <代码>4 =>'四',
  • '4' =>'四',

解决方案

注意,下面的答案适用于 PHP 之前 7 版,因为在 PHP 7 中引入了重大更改这也涉及值结构.

TL;DR

您的问题实际上不是关于PHP 中的内存如何工作"(在这里,我假设您指的是内存分配"),而是关于数组在 PHP 中的工作方式" - 这两个问题是不同的.总结一下下面的内容:

  • PHP 数组不是经典意义上的数组".它们是哈希映射
  • PHP 数组的哈希映射具有特定的结构并使用许多额外的存储东西,例如内部链接指针
  • PHP hash-map 的Hash-map 项也使用附加字段来存储信息.而且 - 是的,不仅字符串/整数键很重要,而且字符串本身是什么,用于您的键.
  • 在您的情况下,带有字符串键的选项将在内存量方面获胜",因为这两个选项都将被散列到 ulong(无符号长)键哈希映射中,因此真正的区别在于值,其中 string-keys 选项具有整数(固定长度)值,而 integer-keys 选项具有字符串(字符相关长度)值.但由于可能发生碰撞,这可能并不总是正确的.
  • String-numeric"键,例如'4',将被视为整数键并转换为整数哈希结果,因为它是整数键.因此,'4'=>'foo'4 =>'foo' 是一样的东西.

另外,重要提示:这里的图形版权属于 :

typedef struct bucket {乌龙h;uint nKeyLength;无效 *pData;无效 *pDataPtr;结构桶 *pListNext;结构桶 *pListLast;结构桶 *pNext;结构桶 *pLast;字符 *arKey;} 桶;

我们在这里:

  • h 是键的整数(ulong)值,它是哈希函数的结果.对于整数键,它与键本身相同(哈希函数返回自身)
  • pNext/pLast 是冲突解决链表内的指针
  • pListNext/pListLast 是订单解析链表内的指针
  • pData 是指向存储值的指针.实际上,值与创建数组时插入的值不同,它是复制,但是,为了避免不必要的开销,PHP 使用 pDataPtr(所以 pData = &pDataPtr)

从这个角度来看,你可能会得到下一个区别的地方:因为字符串键将被散列(因此,h 总是 ulong,因此,相同的大小),这将是存储在值中的问题.所以对于你的字符串键数组会有整数值,而对于整数键数组会有字符串值,这有所不同.但是 - 不,这不是魔术:您无法通过始终以这种方式存储字符串键来节省内存",因为如果您的键很大并且会有很多键,它会导致冲突开销(好吧,概率非常高,但当然不能保证).它只对任意短字符串起作用",不会引起很多冲突.

哈希表本身

已经谈到了元素(桶)及其结构,但也有哈希表本身,它实际上是数组数据结构.所以,它被称为 _hashtable:

typedef struct _hashtable {uint nTableSize;uint nTableMask;uint nNumOfElements;ulong nNextFreeElement;桶 *pInternalPointer;/* 用于元素遍历 */桶 *pListHead;桶 *pListTail;桶**arBuckets;dtor_func_t pDestructor;zend_bool 持久化;无符号字符 nApplyCount;zend_bool bApplyProtection;#if ZEND_DEBUGint 不一致;#万一} 哈希表;

我不会描述所有字段,因为我已经提供了很多信息,这些信息仅与问题有关,但我将简要描述此结构:

  • arBuckets 就是上面描述的,bucket 存储,
  • pListHead/pListTail 是指向订单解析列表的指针
  • nTableSize 确定哈希表的大小.这与内存分配直接相关:nTableSize 总是 2 的幂.因此,无论数组中有 13 还是 14 个元素:实际大小都是 16.考虑到这一点当您想估计数组大小时.

结论

真的很难预测,在您的情况下,一个数组会比另一个大吗.是的,有一些遵循内部结构的准则,但是如果字符串键的长度与整数值相当(例如示例中的 'four', 'one') - 真正的区别在于 - 发生了多少冲突,分配了多少字节来保存值.

但是选择合适的结构应该是感觉问题,而不是记忆问题.如果您的意图是构建相应的索引数据,那么选择总是显而易见的.上面的帖子只是关于一个目标:展示数组在 PHP 中的实际工作方式,以及您可以在哪里找到示例中内存分配的差异.

您还可以查看有关数组的文章 &PHP 中的哈希表:它是 PHP 的 PHP 中的哈希表内部书籍:我使用了那里的一些图形.此外,要了解如何在 PHP 中分配值,请查看 zval Structure 文章,它可能会帮助您理解字符串和字符串之间的区别数组值的整数分配.我没有在这里解释它,因为对我来说更重要的一点 - 是显示数组数据结构以及你的问题的字符串键/整数键的上下文可能有什么不同.

I've always heard and searched for new php 'good writing practice', for example: It's better (for performance) to check if array key exists than search in array, but also it seems better for memory too:

Assuming we have:

$array = array
(
    'one'   => 1,
    'two'   => 2,
    'three' => 3,
    'four'  => 4,
);

this allocates 1040 bytes of memory,

and

$array = array
(
    1 => 'one',
    2 => 'two',
    3 => 'three',
    4 => 'four',
);

requires 1136 bytes

I understand that the key and value surely will have different storing mechanism, but please can you actually point me to the principle how does it work?

Example 2 (for @teuneboon):

$array = array
(
    'one'   => '1',
    'two'   => '2',
    'three' => '3',
    'four'  => '4',
);

1168 bytes

$array = array
(
    '1' => 'one',
    '2' => 'two',
    '3' => 'three',
    '4' => 'four',
);

1136 bytes

consuming same memory:

  • 4 => 'four',
  • '4' => 'four',

解决方案

Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.

TL;DR

Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:

  • PHP arrays aren't "arrays" in classical sense. They are hash-maps
  • Hash-map for PHP array has specific structure and uses many additional storage things, such as internal links pointers
  • Hash-map items for PHP hash-map also use additional fields to store information. And - yes, not only string/integer keys matters, but also what are strings themselves, which are used for your keys.
  • Option with string keys in your case will "win" in terms of memory amount because both options will be hashed into ulong (unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions.
  • "String-numeric" keys, such as '4', will be treated as integer keys and translated into integer hash result as it was integer key. Thus, '4'=>'foo' and 4 => 'foo' are same things.

Also, important note: the graphics here are copyright of PHP internals book

Hash-map for PHP arrays

PHP arrays and C arrays

You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3,-6,'foo' as your "keys" there.

So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9, a-z and A-Z, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.

PHP hash-map keys & collisions

Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.

To illustrate those relations within those lists, here's a graphic:

Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.

One more list: order

To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:

In addition to pListLast and pListNext, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.

Array element from inside

Now we're ready to look into: what is array element, so, bucket:

typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    char *arKey;
} Bucket;

Here we are:

  • h is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself)
  • pNext / pLast are pointers inside collision-resolution linked list
  • pListNext/pListLast are pointers inside order-resolution linked list
  • pData is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP uses pDataPtr (so pData = &pDataPtr)

From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h is always ulong and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.

Hash-table itself

It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable:

typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;   /* Used for element traversal */
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:

  • arBuckets is what was described above, the buckets storage,
  • pListHead/pListTail are pointers to order-resolution list
  • nTableSize determines size of hash-table. And this is directly related to memory allocation: nTableSize is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size.

Conclusion

It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four', 'one' in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.

But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.

You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.

这篇关于PHP 内存实际上是如何工作的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆