写入NumPy memmap仍会加载到RAM内存中 [英] Writing into a NumPy memmap still loads into RAM memory
问题描述
我正在使用以下代码通过IPython Notebook测试NumPy的memmap
I'm testing NumPy's memmap through IPython Notebook, with the following code
Ymap = np.memmap('Y.dat', dtype='float32', mode='w+', shape=(5e6, 4e4))
如您所见,Ymap
的形状相当大.我试图像稀疏矩阵一样填充Ymap
.我不使用scipy.sparse
矩阵,因为最终将需要使用另一个密集矩阵对它进行点积运算,该矩阵肯定不适合内存.
As you can see, Ymap
's shape is pretty large. I'm trying to fill up Ymap
like a sparse matrix. I'm not using scipy.sparse
matrices because I will eventually need to dot-product it with another dense matrix, which will definitely not fit into memory.
无论如何,我正在执行一系列很长的索引操作:
Anyways, I'm performing a very long series of indexing operations:
Ymap = np.memmap('Y.dat', dtype='float32', mode='w+', shape=(5e6, 4e4))
with open("somefile.txt", 'rb') as somefile:
for i in xrange(5e6):
# Read a line
line = somefile.readline()
# For each token in the line, lookup its j value
# Assign the value 1.0 to Ymap[i,j]
for token in line.split():
j = some_dictionary[token]
Ymap[i,j] = 1.0
这些操作以某种方式很快耗尽了我的RAM.我认为内存映射基本上是核心numpy.ndarray
.我错了吗?为什么我的内存使用量像疯了似的飞速上涨?
These operations somehow quickly eat up my RAM. I thought mem-mapping was basically an out-of-core numpy.ndarray
. Am I mistaken? Why is my memory usage sky-rocketing like crazy?
推荐答案
(非匿名)mmap
是文件和RAM之间的链接,可以大致保证mmap
的RAM已满时,数据将分页到给定的文件,而不是分页到交换磁盘/文件,并且当您msync
或munmap
数据时,RAM的整个区域都被写到该文件中.操作系统通常遵循懒惰策略.磁盘访问(或急切需要RAM):只要数据合适,数据就会保留在内存中.这意味着具有大mmap的进程在将其余部分溢出到磁盘上之前,会吃掉尽可能多的RAM.
A (non-anonymous) mmap
is a link between a file and RAM that, roughly, guarantees that when RAM of the mmap
is full, data will be paged to the given file instead of to the swap disk/file, and when you msync
or munmap
it, the whole region of RAM gets written out to the file. Operating systems typically follow a lazy strategy wrt. disk accesses (or eager wrt. RAM): data will remain in memory as long as it fits. This means a process with large mmaps will eat up as much RAM as it can/needs before spilling over the rest to disk.
所以您说对了,np.memmap
阵列是核心外的阵列,但是它会尽可能多地捕获RAM缓存.
So you're right that an np.memmap
array is an out-of-core array, but it is one that will grab as much RAM cache as it can.
这篇关于写入NumPy memmap仍会加载到RAM内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!