list()比列表理解使用更多的内存 [英] list() uses slightly more memory than list comprehension

查看:136
本文介绍了list()比列表理解使用更多的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我在玩list对象,发现一点奇怪的事情是,如果用list()创建list,它将使用比列表理解更多的内存?我正在使用Python 3.5.2

So i was playing with list objects and found little strange thing that if list is created with list() it uses more memory, than list comprehension? I'm using Python 3.5.2

In [1]: import sys
In [2]: a = list(range(100))
In [3]: sys.getsizeof(a)
Out[3]: 1008
In [4]: b = [i for i in range(100)]
In [5]: sys.getsizeof(b)
Out[5]: 912
In [6]: type(a) == type(b)
Out[6]: True
In [7]: a == b
Out[7]: True
In [8]: sys.getsizeof(list(b))
Out[8]: 1008

来自 docs :

列表可以通过几种方式构造:

Lists may be constructed in several ways:

  • 使用一对方括号表示空白列表:[]
  • 使用方括号,并用逗号分隔项目:[a][a, b, c]
  • 使用列表理解:[x for x in iterable]
  • 使用类型构造函数:list()list(iterable)
  • Using a pair of square brackets to denote the empty list: []
  • Using square brackets, separating items with commas: [a], [a, b, c]
  • Using a list comprehension: [x for x in iterable]
  • Using the type constructor: list() or list(iterable)

但是似乎使用list()会占用更多的内存.

But it seems that using list() it uses more memory.

list越大,差距越大.

为什么会这样?

更新#1

使用Python 3.6.0b2测试:

Test with Python 3.6.0b2:

Python 3.6.0b2 (default, Oct 11 2016, 11:52:53) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(list(range(100)))
1008
>>> sys.getsizeof([i for i in range(100)])
912

更新#2

使用Python 2.7.12测试:

Test with Python 2.7.12:

Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(list(xrange(100)))
1016
>>> sys.getsizeof([i for i in xrange(100)])
920

推荐答案

我认为您正在看到过度分配模式,这是

I think you're seeing over-allocation patterns this is a sample from the source:

/* This over-allocates proportional to the list size, making room
 * for additional growth.  The over-allocation is mild, but is
 * enough to give linear-time amortized behavior over a long
 * sequence of appends() in the presence of a poorly-performing
 * system realloc().
 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
 */

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);


打印长度为0-88的列表理解的大小,您可以看到模式匹配项:


Printing the sizes of list comprehensions of lengths 0-88 you can see the pattern matches:

# create comprehensions for sizes 0-88
comprehensions = [sys.getsizeof([1 for _ in range(l)]) for l in range(90)]

# only take those that resulted in growth compared to previous length
steps = zip(comprehensions, comprehensions[1:])
growths = [x for x in list(enumerate(steps)) if x[1][0] != x[1][1]]

# print the results:
for growth in growths:
    print(growth)

结果(格式为(list length, (old total size, new total size))):

(0, (64, 96)) 
(4, (96, 128))
(8, (128, 192))
(16, (192, 264))
(25, (264, 344))
(35, (344, 432))
(46, (432, 528))
(58, (528, 640))
(72, (640, 768))
(88, (768, 912))


出于性能原因而进行了超额分配,从而允许列表增长而不会每次增长都分配更多的内存(更好的摊销效果).


The over-allocation is done for performance reasons allowing lists to grow without allocating more memory with every growth (better amortized performance).

使用列表理解的可能原因之一是列表理解不能确定性地计算生成列表的大小,而list()可以.这意味着在使用过度分配填充列表的过程中,理解力将不断增长,直到最终填充列表.

A probable reason for the difference with using list comprehension, is that list comprehension can not deterministically calculate the size of the generated list, but list() can. This means comprehensions will continuously grow the list as it fills it using over-allocation until finally filling it.

一旦完成,很可能不会增加未分配的未分配节点的过度分配缓冲区(实际上,在大多数情况下,这样做不会破坏过度分配的目的).

It is possible that is will not grow the over-allocation buffer with unused allocated nodes once its done (in fact, in most cases it wont, that would defeat the over-allocation purpose).

list()可以添加一些缓冲区,无论列表大小如何,因为它事先知道最终的列表大小.

list(), however, can add some buffer no matter the list size since it knows the final list size in advance.

另一个从源头获得的支持证据是,我们看到了LIST_APPEND 的> list comprehensions表示list.resize的用法,这反过来表示在不知道要填充多少预分配缓冲区的情况下使用了预分配缓冲区.这与您看到的行为一致.

Another backing evidence, also from the source, is that we see list comprehensions invoking LIST_APPEND, which indicates usage of list.resize, which in turn indicates consuming the pre-allocation buffer without knowing how much of it will be filled. This is consistent with the behavior you're seeing.

最后,list()将根据列表大小预分配更多节点

To conclude, list() will pre-allocate more nodes as a function of the list size

>>> sys.getsizeof(list([1,2,3]))
60
>>> sys.getsizeof(list([1,2,3,4]))
64

列表推导不知道列表的大小,因此随着列表的增长,它会使用追加操作,从而耗尽了预分配缓冲区:

List comprehension does not know the list size so it uses append operations as it grows, depleting the pre-allocation buffer:

# one item before filling pre-allocation buffer completely
>>> sys.getsizeof([i for i in [1,2,3]]) 
52
# fills pre-allocation buffer completely
# note that size did not change, we still have buffered unused nodes
>>> sys.getsizeof([i for i in [1,2,3,4]]) 
52
# grows pre-allocation buffer
>>> sys.getsizeof([i for i in [1,2,3,4,5]])
68

这篇关于list()比列表理解使用更多的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆