如何在磁盘上的地方创建一个numpy .npy文件? [英] How can I create a numpy .npy file in place on disk?

查看:228
本文介绍了如何在磁盘上的地方创建一个numpy .npy文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以创建一个.npy文件而无需先在内存中分配相应的数组?

Is it possible to create an .npy file without allocating the corresponding array in memory first?

我需要创建一个大型的numpy数组并使用它,以至于无法在内存中创建. Numpy支持内存映射,但据我所知,我的选择是:

I need to create and work with a large numpy array, too big to create in memory. Numpy supports memory mapping, but as far as I can see my options are either:

  1. 使用numpy.memmap创建一个映射文件.这将直接在磁盘上创建文件而不分配内存,但是不存储元数据,因此以后在重新映射文​​件时,我需要知道其dtype,shape等.在下面,请注意,未指定形状结果在内存映射中被解释为平面数组:

  1. Create a memmapped file using numpy.memmap. This creates the file directly on disk without allocating memory, but doesn't store the metadata, so when I re-map the file later I need to know its dtype, shape, etc. In the following, notice that not specifying the shape results in the memmap being interpreted as flat array:

In [77]: x=memmap('/tmp/x', int, 'w+', shape=(3,3))


In [78]: x
Out[78]: 
memmap([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])


In [79]: y=memmap('/tmp/x', int, 'r')


In [80]: y
Out[80]: memmap([0, 0, 0, 0, 0, 0, 0, 0, 0])

  • 在内存中创建一个数组,使用numpy.save保存,然后可以以内存映射模式加载它.这会将元数据和数组数据记录在磁盘上,但是要求为整个数组至少分配一次内存.

  • Create an array in memory, save it using numpy.save, after which it can be loaded in memmapped mode. This records metadata with the array data on disk, but requires that memory be allocated for the entire array at least once.

    推荐答案

    我有同样的问题,当我阅读Sven的答复时感到失望.如果您无法在文件上存储大量数组并一次只处理其中的小块,似乎numpy会缺少某些关键功能.您的案例似乎与采用.npy格式的原始用例中的一种接近(请参阅:

    I had the same question and was disappointed when I read Sven's reply. Seems as though numpy would be missing out on some key functionality if you couldn't have a huge array on file and work on little pieces of it at a time. Your case seems to be close to one of the use cases in the origional rational for making the .npy format (see: http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt).

    然后我遇到了numpy.lib.format,它似乎是完全有用的东西.我不知道为什么numpy根包中没有此功能.与HDF5相比,关键优势在于它附带了numpy.

    I then ran into numpy.lib.format, which seems to be full useful goodies. I have no idea why this functionality is not available from the numpy root package. The key advantage over HDF5 is that this ships with numpy.

    >>> print numpy.lib.format.open_memmap.__doc__
    
    """
    Open a .npy file as a memory-mapped array.
    
    This may be used to read an existing file or create a new one.
    
    Parameters
    ----------
    filename : str
        The name of the file on disk. This may not be a filelike object.
    mode : str, optional
        The mode to open the file with. In addition to the standard file modes,
        'c' is also accepted to mean "copy on write". See `numpy.memmap` for
        the available mode strings.
    dtype : dtype, optional
        The data type of the array if we are creating a new file in "write"
        mode.
    shape : tuple of int, optional
        The shape of the array if we are creating a new file in "write"
        mode.
    fortran_order : bool, optional
        Whether the array should be Fortran-contiguous (True) or
        C-contiguous (False) if we are creating a new file in "write" mode.
    version : tuple of int (major, minor)
        If the mode is a "write" mode, then this is the version of the file
        format used to create the file.
    
    Returns
    -------
    marray : numpy.memmap
        The memory-mapped array.
    
    Raises
    ------
    ValueError
        If the data or the mode is invalid.
    IOError
        If the file is not found or cannot be opened correctly.
    
    See Also
    --------
    numpy.memmap
    """
    

    这篇关于如何在磁盘上的地方创建一个numpy .npy文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆