一个包含 100 万个元素的列表在 Python 中会占用多少内存? [英] How much memory will a list with one million elements take up in Python?
问题描述
根据 redditmetrics.com,Reddit 上有超过一百万个 subreddit.
我编写了一个脚本,它反复查询这个 Reddit API 端点,直到所有 subreddit 都存储在一个数组,all_subs
:
all_subs = []对于 中的 sub:all_subs.append({"name": display_name, "subscribers":subscriber_count})
脚本已经运行了近十个小时,大约完成了一半(每三四个请求就会限制速率).完成后,我希望有一个这样的数组:
<预><代码>[{名称":AskReddit",订阅者",16751677 },{ "name": "news", "subscribers", 13860169 },{名称":政治",订阅者",3350326},... # 再加上一百万个条目]这个列表大约会占用多少内存空间?
这取决于您的 Python 版本和您的系统,但我会帮助您计算它需要多少内存.首先,sys.getsizeof
只返回代表容器的对象的内存使用情况,而不是容器中的所有元素.
只有直接归因于对象的内存消耗是占,而不是它所指对象的内存消耗.
如果给定,如果对象不提供,则返回默认值表示检索大小.否则会引发 TypeError.
getsizeof()
调用对象的 __sizeof__
方法并添加一个如果对象由垃圾收集器.
请参阅recursive sizeof recipe,了解使用 getsizeof() 的示例代码>递归查找容器的大小及其所有内容.
所以,我已经在交互式解释器会话中加载了该配方:
因此,CPython list 实际上是一个异构的、可调整大小的数组列表.底层数组只包含指向 Py_Objects 的指针.因此,一个指针占用了一个机器字的内存.在 64 位系统上,这是 64 位,所以是 8 个字节.因此,仅对于容器大小为 1,000,000 的列表将占用大约 800 万字节或 8 兆字节.构建一个包含 1000000 个条目的列表证明了这一点:
In [6]: for i in range(1000000):...: x.append([])...:在[7]中:导入系统在 [8]: sys.getsizeof(x)出[8]:8697464
额外的内存由python对象的开销以及底层数组在末尾留下的额外空间来计算,以允许高效的.append
操作.
现在,字典在 Python 中相当重要.只是容器:
在 [10] 中:sys.getsizeof({})出[10]:288
因此,100 万个字典大小的下限是:288000000 字节.所以,一个粗略的下限:
在[12]中:1000000*288 + 1000000*8出[12]:296000000在 [13] 中:296000000 * 1e-9 # 千兆字节出[13]:0.29600000000000004
因此您可以预期大约 0.3 GB 的内存.使用 recipie 和更真实的 dict
:
在 [16] 中:x = []...:对于我在范围内(1000000):...: x.append(dict(name="我的名字是什么", 订阅者=23456644))...:在 [17] 中:total_size(x)出[17]:296697669在[18]:
所以,大约 0.3 场演出.现在,这在现代系统中并不多.但是如果你想节省空间,你应该使用 tuple
或者更好的,一个 namedtuple
:
In [24]: from collections import namedtuple在 [25] 中:Record = namedtuple('Record', "namesubscribes")在 [26] 中:x = []...:对于我在范围内(1000000):...: x.append(Record(name="我的名字是什么",subscribers=23456644))...:在 [27] 中:total_size(x)出[27]:72697556
或者,以千兆字节为单位:
在[29]中:total_size(x)*1e-9出[29]:0.07269755600000001
namedtuple
的工作方式与 tuple
类似,但您可以使用 names 访问字段:
在 [30] 中:r = x[0]在 [31]: r.nameOut[31]:'我的名字是什么'在 [32]: r.subscribers出[32]:23456644
There are more than a million subreddits on Reddit, according to redditmetrics.com.
I wrote a script that repeatedly queries this Reddit API endpoint until all the subreddits are stored in an array, all_subs
:
all_subs = []
for sub in <repeated request here>:
all_subs.append({"name": display_name, "subscribers": subscriber_count})
The script has been running for close to ten hours, and it's about halfway done (it gets rate-limited every three or four requests). When it's finished, I expect an array like this:
[
{ "name": "AskReddit", "subscribers", 16751677 },
{ "name": "news", "subscribers", 13860169 },
{ "name": "politics", "subscribers", 3350326 },
... # plus one million more entries
]
Approximately how much space in memory will this list take up?
This depends on your Python version and your system, but I will give you a hand figuring out about how much memory it will take. First thing is first, sys.getsizeof
only returns the memory use of the object representing the container, not all the elements in the container.
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.
getsizeof()
calls the object’s__sizeof__
method and adds an additional garbage collector overhead if the object is managed by the garbage collector.See recursive sizeof recipe for an example of using
getsizeof()
recursively to find the size of containers and all their contents.
So, I've loaded up that recipe in an interactive interpreter session:
So, a CPython list is actually a heterogenous, resizable arraylist. The underlying array only contains pointers to Py_Objects. So, a pointer takes up a machine word worth of memory. On a 64-bit system, this is 64 bits, so 8 bytes. So, just for the container a list of size 1,000,000 will take up roughly 8 million bytes, or 8 megabytes. Building a list with 1000000 entries bears that out:
In [6]: for i in range(1000000):
...: x.append([])
...:
In [7]: import sys
In [8]: sys.getsizeof(x)
Out[8]: 8697464
The extra memory is accounted for by the overhead of a python object, and the extra space that a the underlying array leaves at the end to allow for efficient .append
operations.
Now, a dictionary is rather heavy-weight in Python. Just the container:
In [10]: sys.getsizeof({})
Out[10]: 288
So a lower bound on the size of 1 million dicts is: 288000000 bytes. So, a rough lower bound:
In [12]: 1000000*288 + 1000000*8
Out[12]: 296000000
In [13]: 296000000 * 1e-9 # gigabytes
Out[13]: 0.29600000000000004
So you can expect about about 0.3 gigabytes worth of memory. Using the recipie and a more realistic dict
:
In [16]: x = []
...: for i in range(1000000):
...: x.append(dict(name="my name is what", subscribers=23456644))
...:
In [17]: total_size(x)
Out[17]: 296697669
In [18]:
So, about 0.3 gigs. Now, that's not a lot on a modern system. But if you wanted to save space, you should use a tuple
or even better, a namedtuple
:
In [24]: from collections import namedtuple
In [25]: Record = namedtuple('Record', "name subscribers")
In [26]: x = []
...: for i in range(1000000):
...: x.append(Record(name="my name is what", subscribers=23456644))
...:
In [27]: total_size(x)
Out[27]: 72697556
Or, in gigabytes:
In [29]: total_size(x)*1e-9
Out[29]: 0.07269755600000001
namedtuple
works just like a tuple
, but you can access the fields with names:
In [30]: r = x[0]
In [31]: r.name
Out[31]: 'my name is what'
In [32]: r.subscribers
Out[32]: 23456644
这篇关于一个包含 100 万个元素的列表在 Python 中会占用多少内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!