是否可以 np.concatenate 内存映射文件? [英] Is it possible to np.concatenate memory-mapped files?

查看:21
本文介绍了是否可以 np.concatenate 内存映射文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用 np.save() 保存了几个 numpy 数组,并将它们放在一起非常大.

I saved a couple of numpy arrays with np.save(), and put together they're quite huge.

是否可以将它们全部作为内存映射文件加载,然后将它们连接起来并切开所有这些文件,而无需将任何内容加载到内存中?

Is it possible to load them all as memory-mapped files, and then concatenate and slice through all of them without ever loading anythin into memory?

推荐答案

使用 numpy.concatenate 显然是将数组加载到内存中.为了避免这种情况,您可以轻松地在新文件中创建第三个 memmap 数组,并从要连接的数组中读取值.更有效的方法是,您还可以将新数组附加到磁盘上现有的文件中.

Using numpy.concatenate apparently load the arrays into memory. To avoid this you can easily create a thrid memmap array in a new file and read the values from the arrays you wish to concatenate. In a more efficient way, you can also append new arrays to an already existing file on disk.

在任何情况下,您都必须为数组选择正确的顺序(行优先或列优先).

For any case you must choose the right order for the array (row-major or column-major).

以下示例说明如何沿轴 0 和轴 1 进行连接.

The following examples illustrate how to concatenate along axis 0 and axis 1.

1) 沿 axis=0

a = np.memmap('a.array', dtype='float64', mode='w+', shape=( 5000,1000)) # 38.1MB
a[:,:] = 111
b = np.memmap('b.array', dtype='float64', mode='w+', shape=(15000,1000)) # 114 MB
b[:,:] = 222

您可以在模式 r+(读取和追加)下定义第三个数组,读取与要连接的第一个数组(此处为 a)相同的文件,但使用连接后要实现的最终数组的形状,例如:

You can define a third array reading the same file as the first array to be concatenated (here a) in mode r+ (read and append), but with the shape of the final array you want to achieve after concatenation, like:

c = np.memmap('a.array', dtype='float64', mode='r+', shape=(20000,1000), order='C')
c[5000:,:] = b

沿 axis=0 连接不需要传递 order='C',因为这已经是默认顺序了.

Concatenating along axis=0 does not require to pass order='C' because this is already the default order.

2) 沿 axis=1

a = np.memmap('a.array', dtype='float64', mode='w+', shape=(5000,3000)) # 114 MB
a[:,:] = 111
b = np.memmap('b.array', dtype='float64', mode='w+', shape=(5000,1000)) # 38.1MB
b[:,:] = 222

保存在磁盘上的数组实际上是扁平化的,所以如果你用 mode=r+shape=(5000,4000) 创建 c在不改变数组顺序的情况下,a 中第二行的 1000 个第一个元素将转到 c 中的第一行.但是您可以轻松避免这种将 order='F'(列主要)传递给 memmap 的情况:

The arrays saved on disk are actually flattened, so if you create c with mode=r+ and shape=(5000,4000) without changing the array order, the 1000 first elements from the second line in a will go to the first in line in c. But you can easily avoid this passing order='F' (column-major) to memmap:

c = np.memmap('a.array', dtype='float64', mode='r+',shape=(5000,4000), order='F')
c[:, 3000:] = b

<小时>

这里有一个带有连接结果的更新文件a.array".您可以重复此过程以成对连接两个.


Here you have an updated file 'a.array' with the concatenation result. You may repeat this process to concatenate in pairs of two.

相关问题:

这篇关于是否可以 np.concatenate 内存映射文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆