一个包含 100 万个元素的列表在 Python 中会占用多少内存? [英] How much memory will a list with one million elements take up in Python?

查看:19
本文介绍了一个包含 100 万个元素的列表在 Python 中会占用多少内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 redditmetrics.com,Reddit 上有超过一百万个 subreddit.

我编写了一个脚本,它反复查询这个 Reddit API 端点,直到所有 subreddit 都存储在一个数组,all_subs:

all_subs = []对于  中的 sub:all_subs.append({"name": display_name, "subscribers":subscriber_count})

脚本已经运行了近十个小时,大约完成了一半(每三四个请求就会限制速率).完成后,我希望有一个这样的数组:

<预><代码>[{名称":AskReddit",订阅者",16751677 },{ "name": "news", "subscribers", 13860169 },{名称":政治",订阅者",3350326},... # 再加上一百万个条目]

这个列表大约会占用多少内存空间?

解决方案

这取决于您的 Python 版本和您的系统,但我会帮助您计算它需要多少内存.首先,sys.getsizeof 只返回代表容器的对象的内存使用情况,而不是容器中的所有元素.

<块引用>

只有直接归因于对象的内存消耗是占,而不是它所指对象的内存消耗.

如果给定,如果对象不提供,则返回默认值表示检索大小.否则会引发 TypeError.

getsizeof() 调用对象的 __sizeof__ 方法并添加一个如果对象由垃圾收集器.

请参阅recursive sizeof recipe,了解使用 getsizeof() 的示例代码>递归查找容器的大小及其所有内容.

所以,我已经在交互式解释器会话中加载了该配方:

因此,CPython list 实际上是一个异构的、可调整大小的数组列表.底层数组只包含指向 Py_Objects 的指针.因此,一个指针占用了一个机器字的内存.在 64 位系统上,这是 64 位,所以是 8 个字节.因此,仅对于容器大小为 1,000,000 的列表将占用大约 800 万字节或 8 兆字节.构建一个包含 1000000 个条目的列表证明了这一点:

In [6]: for i in range(1000000):...: x.append([])...:在[7]中:导入系统在 [8]: sys.getsizeof(x)出[8]:8697464

额外的内存由python对象的开销以及底层数组在末尾留下的额外空间来计算,以允许高效的.append操作.

现在,字典在 Python 中相当重要.只是容器:

在 [10] 中:sys.getsizeof({})出[10]:288

因此,100 万个字典大小的下限是:288000000 字节.所以,一个粗略的下限:

在[12]中:1000000*288 + 1000000*8出[12]:296000000在 [13] 中:296000000 * 1e-9 # 千兆字节出[13]:0.29600000000000004

因此您可以预期大约 0.3 GB 的内存.使用 recipie 和更真实的 dict:

在 [16] 中:x = []...:对于我在范围内(1000000):...: x.append(dict(name="我的名字是什么", 订阅者=23456644))...:在 [17] 中:total_size(x)出[17]:296697669在[18]:

所以,大约 0.3 场演出.现在,这在现代系统中并不多.但是如果你想节省空间,你应该使用 tuple 或者更好的,一个 namedtuple:

In [24]: from collections import namedtuple在 [25] 中:Record = namedtuple('Record', "namesubscribes")在 [26] 中:x = []...:对于我在范围内(1000000):...: x.append(Record(name="我的名字是什么",subscribers=23456644))...:在 [27] 中:total_size(x)出[27]:72697556

或者,以千兆字节为单位:

在[29]中:total_size(x)*1e-9出[29]:0.07269755600000001

namedtuple 的工作方式与 tuple 类似,但您可以使用 names 访问字段:

在 [30] 中:r = x[0]在 [31]: r.nameOut[31]:'我的名字是什么'在 [32]: r.subscribers出[32]:23456644

There are more than a million subreddits on Reddit, according to redditmetrics.com.

I wrote a script that repeatedly queries this Reddit API endpoint until all the subreddits are stored in an array, all_subs:

all_subs = []
for sub in <repeated request here>:
    all_subs.append({"name": display_name, "subscribers": subscriber_count})

The script has been running for close to ten hours, and it's about halfway done (it gets rate-limited every three or four requests). When it's finished, I expect an array like this:

[
    { "name": "AskReddit", "subscribers", 16751677 },
    { "name": "news", "subscribers", 13860169 },
    { "name": "politics", "subscribers", 3350326 },
    ... # plus one million more entries
]

Approximately how much space in memory will this list take up?

解决方案

This depends on your Python version and your system, but I will give you a hand figuring out about how much memory it will take. First thing is first, sys.getsizeof only returns the memory use of the object representing the container, not all the elements in the container.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

So, I've loaded up that recipe in an interactive interpreter session:

So, a CPython list is actually a heterogenous, resizable arraylist. The underlying array only contains pointers to Py_Objects. So, a pointer takes up a machine word worth of memory. On a 64-bit system, this is 64 bits, so 8 bytes. So, just for the container a list of size 1,000,000 will take up roughly 8 million bytes, or 8 megabytes. Building a list with 1000000 entries bears that out:

In [6]: for i in range(1000000):
   ...:     x.append([])
   ...:

In [7]: import sys

In [8]: sys.getsizeof(x)
Out[8]: 8697464

The extra memory is accounted for by the overhead of a python object, and the extra space that a the underlying array leaves at the end to allow for efficient .append operations.

Now, a dictionary is rather heavy-weight in Python. Just the container:

In [10]: sys.getsizeof({})
Out[10]: 288

So a lower bound on the size of 1 million dicts is: 288000000 bytes. So, a rough lower bound:

In [12]: 1000000*288 + 1000000*8
Out[12]: 296000000

In [13]: 296000000 * 1e-9 # gigabytes
Out[13]: 0.29600000000000004

So you can expect about about 0.3 gigabytes worth of memory. Using the recipie and a more realistic dict:

In [16]: x = []
    ...: for i in range(1000000):
    ...:     x.append(dict(name="my name is what", subscribers=23456644))
    ...:

In [17]: total_size(x)
Out[17]: 296697669

In [18]:

So, about 0.3 gigs. Now, that's not a lot on a modern system. But if you wanted to save space, you should use a tuple or even better, a namedtuple:

In [24]: from collections import namedtuple

In [25]: Record = namedtuple('Record', "name subscribers")

In [26]: x = []
    ...: for i in range(1000000):
    ...:     x.append(Record(name="my name is what", subscribers=23456644))
    ...:

In [27]: total_size(x)
Out[27]: 72697556

Or, in gigabytes:

In [29]: total_size(x)*1e-9
Out[29]: 0.07269755600000001

namedtuple works just like a tuple, but you can access the fields with names:

In [30]: r = x[0]

In [31]: r.name
Out[31]: 'my name is what'

In [32]: r.subscribers
Out[32]: 23456644

这篇关于一个包含 100 万个元素的列表在 Python 中会占用多少内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆