Python:内存中字符串的大小 [英] Python: size of strings in memory
问题描述
考虑以下代码:
arr = []
for (str, id, flag) in some_data:
arr.append((str, id, flag))
想象一下,输入字符串的平均长度为2个字符,最大为5个字符,并且some_data具有100万个元素. 这样的结构的存储需求是什么?
Imagine the input strings being 2 chars long in average and 5 chars max and some_data having 1 million elements. What will the memory requirement of such a structure be?
可能是因为字符串浪费了很多内存?如果是这样,我该如何避免呢?
May it be that a lot of memory is wasted for the strings? If so, how can I avoid that?
推荐答案
在这种情况下,由于字符串很短,而且字符串太多,因此您可以使用 intern
放在字符串上.假设字符串中只有小写字母,则可能有26 * 26 = 676个字符串,因此此列表中必须有很多重复; intern
将确保这些重复不会导致唯一的对象,但是所有重复都引用相同的基础对象.
In this case, because the strings are quite short, and there are so many of them, you stand to save a fair bit of memory by using intern
on the strings. Assuming there are only lowercase letters in the strings, that's 26 * 26 = 676 possible strings, so there must be a lot of repetitions in this list; intern
will ensure that those repetitions don't result in unique objects, but all refer to the same base object.
Python可能已经在实习生短字符串了;但是从许多不同的来源来看,这似乎与实现高度相关.因此,在这种情况下调用intern
可能是 的路; YMMV.
It's possible that Python already interns short strings; but looking at a number of different sources, it seems this is highly implementation-dependent. So calling intern
in this case is probably the way to go; YMMV.
要详细说明为什么这样做很可能节省内存,请考虑以下内容:
As an elaboration on why this is very likely to save memory, consider the following:
>>> sys.getsizeof('')
40
>>> sys.getsizeof('a')
41
>>> sys.getsizeof('ab')
42
>>> sys.getsizeof('abc')
43
在字符串中添加单个字符只会增加字符串本身大小的一个字节,但是每个字符串本身会占用40个字节.
Adding single characters to a string adds only a byte to the size of the string itself, but every string takes up 40 bytes on its own.
这篇关于Python:内存中字符串的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!