写入NumPy memmap仍会加载到RAM内存中 [英] Writing into a NumPy memmap still loads into RAM memory

查看:322
本文介绍了写入NumPy memmap仍会加载到RAM内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码通过IPython Notebook测试NumPy的memmap

I'm testing NumPy's memmap through IPython Notebook, with the following code

Ymap = np.memmap('Y.dat', dtype='float32', mode='w+', shape=(5e6, 4e4))

如您所见,Ymap的形状相当大.我试图像稀疏矩阵一样填充Ymap.我不使用scipy.sparse矩阵,因为最终将需要使用另一个密集矩阵对它进行点积运算,该矩阵肯定不适合内存.

As you can see, Ymap's shape is pretty large. I'm trying to fill up Ymap like a sparse matrix. I'm not using scipy.sparse matrices because I will eventually need to dot-product it with another dense matrix, which will definitely not fit into memory.

无论如何,我正在执行一系列很长的索引操作:

Anyways, I'm performing a very long series of indexing operations:

Ymap = np.memmap('Y.dat', dtype='float32', mode='w+', shape=(5e6, 4e4))
with open("somefile.txt", 'rb') as somefile:
    for i in xrange(5e6):
        # Read a line
        line = somefile.readline()
        # For each token in the line, lookup its j value
        # Assign the value 1.0 to Ymap[i,j]
        for token in line.split():
            j = some_dictionary[token]
            Ymap[i,j] = 1.0

这些操作以某种方式很快耗尽了我的RAM.我认为内存映射基本上是核心numpy.ndarray.我错了吗?为什么我的内存使用量像疯了似的飞速上涨?

These operations somehow quickly eat up my RAM. I thought mem-mapping was basically an out-of-core numpy.ndarray. Am I mistaken? Why is my memory usage sky-rocketing like crazy?

推荐答案

(非匿名)mmap是文件和RAM之间的链接,可以大致保证mmap的RAM已满时,数据将分页到给定的文件,而不是分页到交换磁盘/文件,并且当您msyncmunmap数据时,RAM的整个区域都被写到该文件中.操作系统通常遵循懒惰策略.磁盘访问(或急切需要RAM):只要数据合适,数据就会保留在内存中.这意味着具有大mmap的进程在将其余部分溢出到磁盘上之前,会吃掉尽可能多的RAM.

A (non-anonymous) mmap is a link between a file and RAM that, roughly, guarantees that when RAM of the mmap is full, data will be paged to the given file instead of to the swap disk/file, and when you msync or munmap it, the whole region of RAM gets written out to the file. Operating systems typically follow a lazy strategy wrt. disk accesses (or eager wrt. RAM): data will remain in memory as long as it fits. This means a process with large mmaps will eat up as much RAM as it can/needs before spilling over the rest to disk.

所以您说对了,np.memmap阵列是核心外的阵列,但是它会尽可能多地捕获RAM缓存.

So you're right that an np.memmap array is an out-of-core array, but it is one that will grab as much RAM cache as it can.

这篇关于写入NumPy memmap仍会加载到RAM内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆