是否有可能np.concatenate内存映射文件? [英] Is it possible to np.concatenate memory-mapped files?

查看:480
本文介绍了是否有可能np.concatenate内存映射文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我救了一对夫妻带着np.save()numpy的阵列,并放在一起他们是相当庞大。

I saved a couple of numpy arrays with np.save(), and put together they're quite huge.

是否有可能将其加载所有的内存映射文件,然后串联,切片通过所有这些而没有加载干啥,到内存?

Is it possible to load them all as memory-mapped files, and then concatenate and slice through all of them without ever loading anythin into memory?

推荐答案

使用 numpy.concatenate 显然阵列加载到内存中。为了避免这种情况,你可以轻松地创建一个THRID MEMMAP 数组中的一个新的文件,并从你希望来串联阵列读取的值。在一个更有效的方式,还可以追加新的阵列磁盘上的现有文件。

Using numpy.concatenate apparently load the arrays into memory. To avoid this you can easily create a thrid memmap array in a new file and read the values from the arrays you wish to concatenate. In a more efficient way, you can also append new arrays to an already existing file on disk.

有关你应该照顾到选择阵列的正确顺序任何情况下(行主或列为主)。

For any case you should taking care to choose the right order for the array (row-major or column-major).

让我们来说明这两种情况下二维数组。

Let's illustrate this with two cases in 2D arrays.

1)沿轴= 0

a = np.memmap('a.array', dtype='float64', mode='w+', shape=( 5000,1000)) # 38.1MB
a[:,:] = 111
b = np.memmap('b.array', dtype='float64', mode='w+', shape=(15000,1000)) # 114 MB
b[:,:] = 222

您可以定义一个第三阵列读取相同的文件要连接的第一个阵列(这里 A )的模式 R + (读取和附加),但与最后阵列的形状要串联后实现,如:

You can define a third array reading the same file as the first array to be concatenated (here a) in mode r+ (read and append), but with the shape of the final array you want to achieve after concatenation, like:

c = np.memmap('a.array', dtype='float64', mode='r+', shape=(20000,1000), order='C')
c[5000:,:] = b

以及轴= 0 串联确实需要通过为了='C',因为这已经是默认顺序

Concatenating along axis=0 does require to pass order='C' because this is already the default order.

2)沿轴= 1

a = np.memmap('a.array', dtype='float64', mode='w+', shape=(5000,3000)) # 114 MB
a[:,:] = 111
b = np.memmap('b.array', dtype='float64', mode='w+', shape=(5000,1000)) # 38.1MB
b[:,:] = 222

保存在磁盘上的阵列实际上是平坦的,所以如果你创建 C 模式= R + 形状=(5000,4000)不改变阵列顺序, 1000 从第二行第一个元素 A 将前往在第一线的 C 。但是你可以很容易地避免这种传递为了='F'(列为主),以 MEMMAP

The arrays saved on disk are actually flattened, so if you create c with mode=r+ and shape=(5000,4000) without changing the array order, the 1000 first elements from the second line in a will go to the first in line in c. But you can easily avoid this passing order='F' (column-major) to memmap:

c = np.memmap('a.array', dtype='float64', mode='r+',shape=(5000,4000), order='F')
c[:,3000:] = b


在这里,你有拼接结果的更新文件'a.array。您可以重复此过程中对二来连接。


Here you have an updated file 'a.array' with the concatenation result. You may repeat this process to concatenate in pairs of two.

相关问题:

  • Working with big data in python and numpy, not enough ram, how to save partial results on disc?

这篇关于是否有可能np.concatenate内存映射文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆