如何部分加载用numpy save保存的数组在python中 [英] How to partial load an array saved with numpy save in python

查看:192
本文介绍了如何部分加载用numpy save保存的数组在python中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有numpy保存的多维数组,并且由于数组很大,只想部分加载某个维度.

I have an multi dimentional array with numpy save and want only to partial load some dimension because array is very big.

我如何以简单的方式做到这一点?

How can I do it in simple way ?

上下文简单而基本:

您已经用numpy.save保存了5 Gb阵列.但是,您只需要访问数组A[:,:]的某些部分,而无需在内存中加载5gb.

You have 5 Gb array saved with numpy.save. But, you only need to have access some parts of the array A[:,:] without loading 5gb in Memory.

ANSWER是:使用h5py部分保存/加载数据:此处代码示例:

ANSWER is: Using h5py to save/load partially the data: here code sample:

import sys
import h5py

  def main():
data = read()

if sys.argv[1] == 'x':
    x_slice(data)
elif sys.argv[1] == 'z':
    z_slice(data)

def read():
f = h5py.File('/tmp/test.hdf5', 'r')
return f['seismic_volume']

 def z_slice(data):
return data[:,:,0]

  def x_slice(data):
return data[0,:,:]

推荐答案

您必须有意地保存数组以进行部分加载;你不能做一般的事情.

You'd have to intentionally save the array for partial loading; you can't do generically.

例如,您可以拆分数组(沿维度之一),然后用savez保存子数组.这样的文件归档文件的load是惰性"文件,仅读取您要求的子文件.

You could, for example, split the array (along one of the dimensions) and save the subarrays with savez. load of a such a file archive is 'lazy', only reading the subfiles you ask for.

h5py是一个附加软件包,用于保存和加载HDF5文件中的数据.这样可以进行部分读取.

h5py is an add on package that saves and loads data from HDF5 files. That allows for partial reads.

numpy.memmap是另一种选择,将文件视为存储数组的内存.

numpy.memmap is another option, treating a file as memory that stores an array.

查找这些文档以及以前的SO问题.

Look up the docs for these, as well as previous SO questions.

最快的numpy数组保存和加载选项

使用h5py编写大型hdf5数据集

详细说明.有一些小问题尚不清楚. 加载某些尺寸"到底是什么意思?最简单的解释是您需要A[0,...]A[3:10,...].另一个是简单方式"的含义.这是否意味着您已经有了一种复杂的方法,又有什么简单的方法呢?还是只是不想重写numpy.load函数来执行任务?

To elaborate on the holds. There are minor points that aren't clear. What exactly do you mean by 'load some dimension'? The simplest interpretation is that you want A[0,...] or A[3:10,...]. The other is the implication of 'simple way'. Does that mean you already have a complex way, and what a simpler one? Or just that you don't want to rewrite the numpy.load function to do the task?

否则,我认为问题是很明确的-简单的答案是-不,没有简单的方法.

Otherwise I think the question is reasonably clear - and the simple answer is - no there isn't a simple way.

我很想重新提出问题,以便其他有经验的numpy海报可以参与其中.

I'm tempted to reopen the question so other experienced numpy posters can weigh in.

我应该已经审阅过load文档(OP也应该具有!).正如ali_m所述,存在内存映射模式.文档说:

I should have reviewed the load docs (the OP should have as well!). As ali_m commented there is a memory map mode. The docs say:

mmap_mode:{无,'r +','r','w +','c'},可选

mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional

   If not None, then memory-map the file, using the given mode
    (see `numpy.memmap` for a detailed description of the modes).
    A memory-mapped array is kept on disk. However, it can be accessed
    and sliced like any ndarray.  Memory mapping is especially useful for
    accessing small fragments of large files without reading the entire
    file into memory.

numpy如何处理mmap超过npz文件? (我在这几个月前就进行了研究,但是忘记了选择.)

How does numpy handle mmap's over npz files? (I dug into this months ago, but forgot the option.)

Python内存映射

这篇关于如何部分加载用numpy save保存的数组在python中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆