内存中的数据大小与磁盘上的数据大小 [英] Data size in memory vs. on disk

查看:84
本文介绍了内存中的数据大小与磁盘上的数据大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将数据存储在内存中所需的RAM与将相同数据存储在文件中所需的磁盘空间相比如何?还是没有广义的相关性?

How does the RAM required to store data in memory compare to the disk space required to store the same data in a file? Or is there no generalized correlation?

例如,假设我仅具有十亿个浮点值.以二进制形式存储,在磁盘上为40亿字节或3.7GB(不包括标头等).然后说我将这些值读入Python列表中...我期望需要多少RAM?

For example, say I simply have a billion floating point values. Stored in binary form, that'd be 4 billion bytes or 3.7GB on disk (not including headers and such). Then say I read those values into a list in Python... how much RAM should I expect that to require?

推荐答案

Python对象数据大小

如果数据存储在某些python对象中,则内存中的实际数据将附加一些数据.

Python Object Data Size

If the data is stored in some python object, there will be a little more data attached to the actual data in memory.

这可能很容易测试.

有趣的是,起初,python对象的开销对于小数据来说是重要的,但是很快就可以忽略不计了.

It is interesting to note how, at first, the overhead of the python object is significant for small data, but quickly becomes negligible.

这是用于生成绘图的iPython代码

Here is the iPython code used to generate the plot

%matplotlib inline
import random
import sys
import array
import matplotlib.pyplot as plt

max_doubles = 10000

raw_size = []
array_size = []
string_size = []
list_size = []
set_size = []
tuple_size = []
size_range = range(max_doubles)

# test double size
for n in size_range:
    double_array = array.array('d', [random.random() for _ in xrange(n)])
    double_string = double_array.tostring()
    double_list = double_array.tolist()
    double_set = set(double_list)
    double_tuple = tuple(double_list)

    raw_size.append(double_array.buffer_info()[1] * double_array.itemsize)
    array_size.append(sys.getsizeof(double_array))
    string_size.append(sys.getsizeof(double_string))
    list_size.append(sys.getsizeof(double_list))
    set_size.append(sys.getsizeof(double_set))
    tuple_size.append(sys.getsizeof(double_tuple))

# display
plt.figure(figsize=(10,8))
plt.title('The size of data in various forms', fontsize=20)
plt.xlabel('Data Size (double, 8 bytes)', fontsize=15)
plt.ylabel('Memory Size (bytes)', fontsize=15)
plt.loglog(
    size_range, raw_size, 
    size_range, array_size, 
    size_range, string_size,
    size_range, list_size,
    size_range, set_size,
    size_range, tuple_size
)
plt.legend(['Raw (Disk)', 'Array', 'String', 'List', 'Set', 'Tuple'], fontsize=15, loc='best')

这篇关于内存中的数据大小与磁盘上的数据大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆