部分读取大型numpy文件的有效方法? [英] Efficient way to partially read large numpy file?

查看:2167
本文介绍了部分读取大型numpy文件的有效方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的numpy 3D张量,该张量存储在磁盘上的文件中(通常使用np.load读取).这是一个二进制.npy文件.使用np.load时,我很快就耗尽了大部分内存.

I have a huge numpy 3D tensor which is stored in a file on my disk (which I normally read using np.load). This is a binary .npy file. On using np.load, I quickly end up using most of my memory.

幸运的是,在程序的每次运行中,我只需要庞大张量的一部分即可.切片的大小固定,其尺寸由外部模块提供.

Luckily, at every run of the program, I only require a certain slice of the huge tensor. The slice is of a fixed size and its dimensions are provided from an external module.

执行此操作的最佳方法是什么?我能弄清楚的唯一方法是以某种方式将此numpy矩阵存储到MySQL数据库中.但是我敢肯定,有很多更好/更容易的方法.如果有帮助,我也很乐意以其他方式构建我的3D张量文件.

What's the best way to do this? The only way I could figure out is somehow storing this numpy matrix into a MySQL database. But I'm sure there are much better / easier ways. I'll also be happy to build my 3D tensor file differently if it will help.

如果我的张量本质上是稀疏的,答案会改变吗?

Does the answer change if my tensor is sparse in nature?

推荐答案

使用

use numpy.load as normal, but be sure to specify the mmap_mode keyword so that the array is kept on disk, and only necessary bits are loaded into memory upon access.

mmap_mode: {无,"r +","r","w +","c"},可选)如果不是无",则 使用给定模式对文件进行内存映射(有关详细信息,请参见numpy.memmap 模式的详细说明).内存映射数组保留在 磁盘.但是,可以像访问任何ndarray一样对其进行访问和切片.记忆 映射对于访问较大的小片段特别有用 文件而不将整个文件读入内存.

mmap_mode : {None, ‘r+’, ‘r’, ‘w+’, ‘c’}, optional If not None, then memory-map the file, using the given mode (see numpy.memmap for a detailed description of the modes). A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

这些模式在 numpy.memmap中进行了介绍:

The modes are descirbed in numpy.memmap:

模式: {'r +','r','w +','c'},可选文件在此打开 模式:"r"打开现有文件,以供只读. "r +"打开现有文件 用于阅读和写作. "w +"为以下内容创建或覆盖现有文件 读写. "c"写时复制:分配会影响其中的数据 内存,但更改不会保存到磁盘.磁盘上的文件是 只读.

mode : {‘r+’, ‘r’, ‘w+’, ‘c’}, optional The file is opened in this mode: ‘r’ Open existing file for reading only. ‘r+’ Open existing file for reading and writing. ‘w+’ Create or overwrite existing file for reading and writing. ‘c’ Copy-on-write: assignments affect data in memory, but changes are not saved to disk. The file on disk is read-only.

*请确保不要使用"w +"模式,因为它会删除文件的内容.

*be sure to not use 'w+' mode, as it will erase your file's contents.

这篇关于部分读取大型numpy文件的有效方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆