Python:内存中字符串的大小 [英] Python: size of strings in memory

查看:575
本文介绍了Python:内存中字符串的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下代码:

arr = []
for (str, id, flag) in some_data:
    arr.append((str, id, flag))

想象一下,输入字符串的平均长度为2个字符,最大为5个字符,并且some_data具有100万个元素. 这样的结构的存储需求是什么?

Imagine the input strings being 2 chars long in average and 5 chars max and some_data having 1 million elements. What will the memory requirement of such a structure be?

可能是因为字符串浪费了很多内存?如果是这样,我该如何避免呢?

May it be that a lot of memory is wasted for the strings? If so, how can I avoid that?

推荐答案

在这种情况下,由于字符串很短,而且字符串太多,因此您可以使用 intern 放在字符串上.假设字符串中只有小写字母,则可能有26 * 26 = 676个字符串,因此此列表中必须有很多重复; intern将确保这些重复不会导致唯一的对象,但是所有重复都引用相同的基础对象.

In this case, because the strings are quite short, and there are so many of them, you stand to save a fair bit of memory by using intern on the strings. Assuming there are only lowercase letters in the strings, that's 26 * 26 = 676 possible strings, so there must be a lot of repetitions in this list; intern will ensure that those repetitions don't result in unique objects, but all refer to the same base object.

Python可能已经在实习生短字符串了;但是从许多不同的来源来看,这似乎与实现高度相关.因此,在这种情况下调用intern可能是 的路; YMMV.

It's possible that Python already interns short strings; but looking at a number of different sources, it seems this is highly implementation-dependent. So calling intern in this case is probably the way to go; YMMV.

要详细说明为什么这样做很可能节省内存,请考虑以下内容:

As an elaboration on why this is very likely to save memory, consider the following:

>>> sys.getsizeof('')
40
>>> sys.getsizeof('a')
41
>>> sys.getsizeof('ab')
42
>>> sys.getsizeof('abc')
43

在字符串中添加单个字符只会增加字符串本身大小的一个字节,但是每个字符串本身会占用40个字节.

Adding single characters to a string adds only a byte to the size of the string itself, but every string takes up 40 bytes on its own.

这篇关于Python:内存中字符串的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆