如何在磁盘上的地方创建一个numpy .npy文件? [英] How can I create a numpy .npy file in place on disk?
问题描述
是否可以创建一个.npy文件而无需先在内存中分配相应的数组?
Is it possible to create an .npy file without allocating the corresponding array in memory first?
我需要创建一个大型的numpy数组并使用它,以至于无法在内存中创建. Numpy支持内存映射,但据我所知,我的选择是:
I need to create and work with a large numpy array, too big to create in memory. Numpy supports memory mapping, but as far as I can see my options are either:
-
使用numpy.memmap创建一个映射文件.这将直接在磁盘上创建文件而不分配内存,但是不存储元数据,因此以后在重新映射文件时,我需要知道其dtype,shape等.在下面,请注意,未指定形状结果在内存映射中被解释为平面数组:
Create a memmapped file using numpy.memmap. This creates the file directly on disk without allocating memory, but doesn't store the metadata, so when I re-map the file later I need to know its dtype, shape, etc. In the following, notice that not specifying the shape results in the memmap being interpreted as flat array:
In [77]: x=memmap('/tmp/x', int, 'w+', shape=(3,3))
In [78]: x
Out[78]:
memmap([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
In [79]: y=memmap('/tmp/x', int, 'r')
In [80]: y
Out[80]: memmap([0, 0, 0, 0, 0, 0, 0, 0, 0])
在内存中创建一个数组,使用numpy.save保存,然后可以以内存映射模式加载它.这会将元数据和数组数据记录在磁盘上,但是要求为整个数组至少分配一次内存.
Create an array in memory, save it using numpy.save, after which it can be loaded in memmapped mode. This records metadata with the array data on disk, but requires that memory be allocated for the entire array at least once.
推荐答案
我有同样的问题,当我阅读Sven的答复时感到失望.如果您无法在文件上存储大量数组并一次只处理其中的小块,似乎numpy会缺少某些关键功能.您的案例似乎与采用.npy格式的原始用例中的一种接近(请参阅:
I had the same question and was disappointed when I read Sven's reply. Seems as though numpy would be missing out on some key functionality if you couldn't have a huge array on file and work on little pieces of it at a time. Your case seems to be close to one of the use cases in the origional rational for making the .npy format (see: http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt).
然后我遇到了numpy.lib.format,它似乎是完全有用的东西.我不知道为什么numpy根包中没有此功能.与HDF5相比,关键优势在于它附带了numpy.
I then ran into numpy.lib.format, which seems to be full useful goodies. I have no idea why this functionality is not available from the numpy root package. The key advantage over HDF5 is that this ships with numpy.
>>> print numpy.lib.format.open_memmap.__doc__
"""
Open a .npy file as a memory-mapped array.
This may be used to read an existing file or create a new one.
Parameters
----------
filename : str
The name of the file on disk. This may not be a filelike object.
mode : str, optional
The mode to open the file with. In addition to the standard file modes,
'c' is also accepted to mean "copy on write". See `numpy.memmap` for
the available mode strings.
dtype : dtype, optional
The data type of the array if we are creating a new file in "write"
mode.
shape : tuple of int, optional
The shape of the array if we are creating a new file in "write"
mode.
fortran_order : bool, optional
Whether the array should be Fortran-contiguous (True) or
C-contiguous (False) if we are creating a new file in "write" mode.
version : tuple of int (major, minor)
If the mode is a "write" mode, then this is the version of the file
format used to create the file.
Returns
-------
marray : numpy.memmap
The memory-mapped array.
Raises
------
ValueError
If the data or the mode is invalid.
IOError
If the file is not found or cannot be opened correctly.
See Also
--------
numpy.memmap
"""
这篇关于如何在磁盘上的地方创建一个numpy .npy文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!