该字节字符串实际占用多少内存? [英] How much memory does this byte string actually take up?

查看:119
本文介绍了该字节字符串实际占用多少内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的理解是os.urandom(size)输出给定"size"的字节的随机字符串,但是然后:

import os
import sys

print(sys.getsizeof(os.urandom(42)))

>>>
75

为什么不是42?

和一个相关的问题:

import base64
import binascii


print(sys.getsizeof(base64.b64encode(os.urandom(42))))
print(sys.getsizeof(binascii.hexlify(os.urandom(42))))

>>>
89
117

为什么这些是如此不同?哪种编码将是存储字节串(例如os.urandom给出的字节串)的最有效内存方式?

似乎可以说这个问题是解决方案

Python字节字符串对象不仅仅是构成它们的字符.它们是完全成熟的对象.因此,它们需要更多空间来容纳对象的组件,例如类型指针(需要识别字节串甚至是哪种对象)和长度(出于效率和Python字节串可以包含空字节的需要).

最简单的对象object实例需要空间:

>>> sys.getsizeof(object())
16

问题的第二部分仅仅是因为b64encode()hexlify()产生的字符串具有不同的长度.后者长28个字符,这毫不奇怪,这就是sys.getsizeof()报告的值之间的差异.

>>> s1 = base64.b64encode(os.urandom(42))
>>> s1
b'CtlMjDM9q7zp+pGogQci8gr0igJsyZVjSP4oWmMj2A8diawJctV/8sTa'
>>> s2 = binascii.hexlify(os.urandom(42))
>>> s2
b'c82d35f717507d6f5ffc5eda1ee1bfd50a62689c08ba12055a5c39f95b93292ddf4544751fbc79564345'

>>> len(s2) - len(s1)
28
>>> sys.getsizeof(s2) - sys.getsizeof(s1)
28


除非您使用某种形式的压缩,否则没有任何一种编码比您已经拥有的二进制字符串更有效,并且在这种情况下尤其如此,因为数据是 random 本质上是不可压缩的.

My understanding is that os.urandom(size) outputs a random string of bytes of the given "size", but then:

import os
import sys

print(sys.getsizeof(os.urandom(42)))

>>>
75

Why is this not 42?

And a related question:

import base64
import binascii


print(sys.getsizeof(base64.b64encode(os.urandom(42))))
print(sys.getsizeof(binascii.hexlify(os.urandom(42))))

>>>
89
117

Why are these so different? Which encoding would be the most memory efficient way to store a string of bytes such as that given by os.urandom?

Edit: It seems like quite a stretch to say that this question is a duplicate of What is the difference between len() and sys.getsizeof() methods in python? My question is not about the difference between len() and getsizeof(). I was confused about the memory used by Python objects in general, which the answer to this question has clarified for me.

解决方案

Python byte string objects are more than just the characters that comprise them. They are fully fledged objects. As such they require more space to accommodate the object's components such as the type pointer (needed to identify what kind of object the bytestring even is) and the length (needed for efficiency and because Python bytestrings can contain null bytes).

The simplest object, an object instance, requires space:

>>> sys.getsizeof(object())
16

The second part of your question is simply because the strings produced by b64encode() and hexlify() have different lengths; the latter being 28 characters longer which, unsurprisingly, is the difference in the values reported by sys.getsizeof().

>>> s1 = base64.b64encode(os.urandom(42))
>>> s1
b'CtlMjDM9q7zp+pGogQci8gr0igJsyZVjSP4oWmMj2A8diawJctV/8sTa'
>>> s2 = binascii.hexlify(os.urandom(42))
>>> s2
b'c82d35f717507d6f5ffc5eda1ee1bfd50a62689c08ba12055a5c39f95b93292ddf4544751fbc79564345'

>>> len(s2) - len(s1)
28
>>> sys.getsizeof(s2) - sys.getsizeof(s1)
28


Unless you use some form of compression, there is no encoding that will be more efficient than the binary string that you already have, and this is particularly true in this case because the data is random which is inherently incompressible.

这篇关于该字节字符串实际占用多少内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆