Python中包含一百万个元素的列表将占用多少内存? [英] How much memory will a list with one million elements take up in Python?

查看:200
本文介绍了Python中包含一百万个元素的列表将占用多少内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 redditmetrics.com ,Reddit上有超过一百万个subreddit.

There are more than a million subreddits on Reddit, according to redditmetrics.com.

我编写了一个脚本,该脚本反复查询此Reddit API端点,直到将所有子reddit存储在其中数组all_subs:

I wrote a script that repeatedly queries this Reddit API endpoint until all the subreddits are stored in an array, all_subs:

all_subs = []
for sub in <repeated request here>:
    all_subs.append({"name": display_name, "subscribers": subscriber_count})

该脚本已经运行了将近十个小时,并且大约完成了一半(每三个或四个请求受到速率限制).完成后,我希望像这样的数组:

The script has been running for close to ten hours, and it's about halfway done (it gets rate-limited every three or four requests). When it's finished, I expect an array like this:

[
    { "name": "AskReddit", "subscribers", 16751677 },
    { "name": "news", "subscribers", 13860169 },
    { "name": "politics", "subscribers", 3350326 },
    ... # plus one million more entries
]

此列表将占用多少内存?

推荐答案

这取决于您的Python版本和您的系统,但是我将帮助您确定需要多少内存.首先是第一件事, sys.getsizeof 仅返回内存使用情况表示容器的 object 的名称,而不是容器中的所有元素.

This depends on your Python version and your system, but I will give you a hand figuring out about how much memory it will take. First thing is first, sys.getsizeof only returns the memory use of the object representing the container, not all the elements in the container.

仅直接归因于该对象的内存消耗为 而不是它所引用的对象的内存消耗.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

如果给定,则如果对象不提供默认值,则将返回默认值 表示检索大小.否则会引发TypeError.

If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.

getsizeof()调用对象的__sizeof__方法并添加一个 如果对象是由对象管理的,则额外的垃圾收集器开销 垃圾收集器.

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

有关使用getsizeof()的示例,请参见配方的递归大小 递归地找到容器的大小及其所有内容.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

因此,我已经在交互式解释器会话中加载了该配方:

So, I've loaded up that recipe in an interactive interpreter session:

因此,CPython 列表实际上是一个异构的,可调整大小的数组列表.基础数组仅包含指向Py_Objects的指针.因此,指针占用了机器字值的内存.在64位系统上,这是64位,即8个字节.因此,仅针对容器,大小为1,000,000的列表将占用大约800万个字节(即8兆字节).构建包含1000000个条目的列表可以证明这一点:

So, a CPython list is actually a heterogenous, resizable arraylist. The underlying array only contains pointers to Py_Objects. So, a pointer takes up a machine word worth of memory. On a 64-bit system, this is 64 bits, so 8 bytes. So, just for the container a list of size 1,000,000 will take up roughly 8 million bytes, or 8 megabytes. Building a list with 1000000 entries bears that out:

In [6]: for i in range(1000000):
   ...:     x.append([])
   ...:

In [7]: import sys

In [8]: sys.getsizeof(x)
Out[8]: 8697464

额外的内存由python对象的开销和底层数组在末尾留下的额外空间所占,以允许有效的.append操作.

The extra memory is accounted for by the overhead of a python object, and the extra space that a the underlying array leaves at the end to allow for efficient .append operations.

现在,字典在Python中非常重要.只是容器:

Now, a dictionary is rather heavy-weight in Python. Just the container:

In [10]: sys.getsizeof({})
Out[10]: 288

因此,一百万个字的大小的下界是:288000000字节.因此,大致的下限是:

So a lower bound on the size of 1 million dicts is: 288000000 bytes. So, a rough lower bound:

In [12]: 1000000*288 + 1000000*8
Out[12]: 296000000

In [13]: 296000000 * 1e-9 # gigabytes
Out[13]: 0.29600000000000004

因此,您可以期待大约0.3 GB的内存.使用食谱和更现实的dict:

So you can expect about about 0.3 gigabytes worth of memory. Using the recipie and a more realistic dict:

In [16]: x = []
    ...: for i in range(1000000):
    ...:     x.append(dict(name="my name is what", subscribers=23456644))
    ...:

In [17]: total_size(x)
Out[17]: 296697669

In [18]:

因此,大约有0.3场演出.现在,在现代系统上并没有很多.但是,如果您想节省空间,则应该使用tuple甚至更好的namedtuple:

So, about 0.3 gigs. Now, that's not a lot on a modern system. But if you wanted to save space, you should use a tuple or even better, a namedtuple:

In [24]: from collections import namedtuple

In [25]: Record = namedtuple('Record', "name subscribers")

In [26]: x = []
    ...: for i in range(1000000):
    ...:     x.append(Record(name="my name is what", subscribers=23456644))
    ...:

In [27]: total_size(x)
Out[27]: 72697556

或者,以GB为单位:

In [29]: total_size(x)*1e-9
Out[29]: 0.07269755600000001

namedtuple的工作方式与tuple相似,但是您可以使用名称来访问字段:

namedtuple works just like a tuple, but you can access the fields with names:

In [30]: r = x[0]

In [31]: r.name
Out[31]: 'my name is what'

In [32]: r.subscribers
Out[32]: 23456644

这篇关于Python中包含一百万个元素的列表将占用多少内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆