Python内存映射 [英] Python memory mapping

查看:91
本文介绍了Python内存映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大数据,并且矩阵的大小为2000x100000,因此为了更快地工作,我尝试使用numpy.memmap以避免由于RAM限制而在内存中存储这么大的矩阵.问题是当我将相同的矩阵存储在2个变量中时,即一个存储在numpy.load中,另一个存储在np.memmap中,内容不一样.这正常吗?我在memmap中使用的数据类型与数据中的相同.示例:

I am working with big data and i have matrices with size like 2000x100000, so in order to to work faster i tried using the numpy.memmap to avoid storing in memory this large matrices due to the RAM limitations. The problem is that when i store the same matrix in 2 variables, i.e One with numpy.load and in the other with np.memmap, the contents are not the same. Is this normal? I am using the same data type in memmap as in my data. Example:

A1 = numpy.load('mydata.npy')
A2 = numpy.memmap('mydata.npy',dtype=numpy.float64, mode='r', shape=(2000,2000))
A1[0,0] = 0
A2[0,0] = 1.8758506894003703e-309

这是两种情况下数组第一个元素的内容.正确的值是0,但是我通过使用memmap得到了这个怪异的数字. 谢谢.

That's the contents of the first element of the array in both cases. The correct one is the value 0 but i am getting this weird number by using the memmap. Thank you.

推荐答案

NPY格式不仅仅是将数组数据转储到文件中.它包括一个标头,该标头除其他外还包含定义数组的数据类型和形状的元数据.当像完成操作一样直接使用memmap时,内存映射将不考虑元数据存储位置的文件头.要创建NPY文件的内存映射视图,可以使用np.loadmmap_mode选项.

The NPY format is not simply a dump of the array's data to a file. It includes a header that contains, among other things, the metadata that defines the array's data type and shape. When you use memmap directly like you have done, your memory map doesn't take into account the file's header where the metadata is stored. To create a memory mapped view of a NPY file, you can use the mmap_mode option of np.load.

这是一个例子.首先,创建一个NPY文件:

Here's an example. First, create a NPY file:

In [1]: a = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

In [2]: np.save('a.npy', a)

np.load重新读回:

In [3]: a1 = np.load('a.npy')

In [4]: a1
Out[4]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

使用memmap错误查看文件:

In [5]: a2 = np.memmap('a.npy', dtype=np.float64, mode='r', shape=(2, 3))

In [6]: a2
Out[6]: 
memmap([[  1.87585069e-309,   1.17119999e+171,   5.22741680e-037],
       [  8.44740097e+252,   2.65141232e+180,   9.92152605e+247]])

使用np.load和选项mmap_mode='r'创建memmap:

In [7]: a3 = np.load('a.npy', mmap_mode='r')

In [8]: a3
Out[8]: 
memmap([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

这篇关于Python内存映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆