计算python字符串的实际大小 [英] calculating the real size of a python string

查看:67
本文介绍了计算python字符串的实际大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先这是我的电脑规格:

内存 - https://gist.github.com/vyscond/6425304

CPU - https://gist.github.com/vyscond/6425322

所以今天早上我测试了以下 2 个代码片段:

代码 A

a = 'a' * 1000000000

和代码B

a = 'a' * 10000000000

代码 A 工作正常.但是代码 B 给了我一些错误信息:

回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>内存错误

于是我开始研究在python上测量数据大小的方法.

我发现的第一件事是经典的内置函数len().

对于代码 A 函数 len() 返回值 1000000000,但对于代码 B 返回相同的内存错误.

在此之后,我决定在此测试中获得更高的精度.所以我从 sys 模块中找到了一个名为 getsizeof() 的函数.使用这个函数,我对代码 A 做了同样的测试:

sys.getsizeof('a' * 1000000000)

返回结果为1000000037(以字节为单位)

  • 问题 1 - 这意味着 0.9313226090744 GB?

所以我用单个字符'a'

检查了一个字符串的字节数

sys.getsizeof('a')

返回结果为38(以字节为单位)

  • 问题 02 - 这意味着如果我们需要一个由 1000000000 个字符 'a' 组成的字符串,这将导致 38 * 1000000000 = 38.000.000.000 字节?

  • 问题 03 - 这意味着我们需要 35.390257835388 GB 来保存这样的字符串?

我想知道这个推理的错误在哪里!因为这对我来说没有任何意义'-'

解决方案

Python 对象具有最小的大小,即保留多条簿记数据附加到对象的开销.

Python str 对象也不例外.看看没有、一、二、三字符的字符串的区别:

<预><代码>>>>导入系统>>>sys.getsizeof('')37>>>sys.getsizeof('a')38>>>sys.getsizeof('aa')39>>>sys.getsizeof('aaa')40

Python str 对象开销在我的机器上是 37 个字节,但字符串中的每个字符只占用固定开销一个字节.

因此,具有 10 亿个字符的 str 值需要 10 亿字节 + 37 字节的内存开销.这确实是大约 0.931 GB.

您的示例代码B"创建了十倍多的字符,因此您需要将近 10 GB 的内存来保存该字符串,不包括 Python 的其余部分、操作系统以及该机器上可能正在运行的任何其他内容.

First of all this is my computer Spec :

Memory - https://gist.github.com/vyscond/6425304

CPU - https://gist.github.com/vyscond/6425322

So this morning I've tested the following 2 code snippets:

code A

a = 'a' * 1000000000

and code B

a = 'a' * 10000000000

The code A works fine. But the code B give me some error message :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

So I started a researching about method to measuring the size of data on python.

The first thing I've found is the classic built-in function len().

for code A function len() returned the value 1000000000, but for code B the same memory error was returned.

After this I decided to get more precision on this tests. So I've found a function from the sys module called getsizeof(). With this function I made the same test on code A:

sys.getsizeof( 'a' * 1000000000 )

the result return is 1000000037 (in bytes)

  • question 1 - which means 0.9313226090744 gigabytes?

So I checked the amount of bytes of a string with a single character 'a'

sys.getsizeof( 'a' )

the result return is 38 (in bytes)

  • question 02 - which means if we need a string composed of 1000000000 character 'a' this will result in 38 * 1000000000 = 38.000.000.000 bytes?

  • question 03 - which means we need a 35.390257835388 gigabytes to hold a string like this?

I would like to know where is the error in this reasoning! Because this not any sense to me '-'

解决方案

Python objects have a minimal size, the overhead of keeping several pieces of bookkeeping data attached to the object.

A Python str object is no exception. Take a look at the difference between a string with no, one, two and three characters:

>>> import sys
>>> sys.getsizeof('')
37
>>> sys.getsizeof('a')
38
>>> sys.getsizeof('aa')
39
>>> sys.getsizeof('aaa')
40

The Python str object overhead is 37 bytes on my machine, but each character in the string only takes one byte over the fixed overhead.

Thus, a str value with 1000 million characters requires 1000 million bytes + 37 bytes overhead of memory. That is indeed about 0.931 gigabytes.

Your sample code 'B' created ten times more characters, so you needed nearly 10 gigabyte of memory just to hold that one string, not counting the rest of Python, and the OS and whatever else might be running on that machine.

这篇关于计算python字符串的实际大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆