NumPy中广播操作带来的内存增长 [英] Memory growth with broadcast operations in NumPy

查看:102
本文介绍了NumPy中广播操作带来的内存增长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用NumPy处理一些大数据矩阵(大小约为50GB).我正在运行此代码的机器具有128GB的RAM,因此执行如此大规模的简单线性操作在内存方面不是问题.

I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of this magnitude shouldn't be a problem memory-wise.

但是,在Python中计算以下代码时,我见证了内存的巨大增长(超过100GB):

However, I am witnessing a huge memory growth (to more than 100GB) when computing the following code in Python:

import numpy as np

# memory allocations (everything works fine)
a = np.zeros((1192953, 192, 32), dtype='f8')
b = np.zeros((1192953, 192), dtype='f8')
c = np.zeros((192, 32), dtype='f8')

a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here

请注意,初始内存分配已完成,没有任何问题.但是,当我尝试通过广播执行减法运算时,内存会增加到100GB以上.我一直认为广播会避免进行额外的内存分配,但是现在我不确定这种情况是否总是如此.

Please note that initial memory allocations are done without any problems. However, when I try to perform the subtract operation with broadcasting, the memory grows to more than 100GB. I always thought that broadcasting would avoid making extra memory allocations but now I am not sure if this is always the case.

这样,有人可以详细说明为什么这种内存增长的原因,以及如何使用内存效率更高的结构重写以下代码吗?

As such, can someone give some details on why this memory growth is happening, and how the following code could be rewritten using more memory efficient constructs?

我正在IPython Notebook中的Python 2.7中运行代码.

I am running the code in Python 2.7 within IPython Notebook.

推荐答案

@rth的建议是小批量进行操作.您也可以尝试使用函数np.subtract并将其指定给目标数组,以避免创建其他临时数组.我还认为您不需要将c索引为c[np.newaxis, :, :],因为它已经是一个3维数组.

@rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c as c[np.newaxis, :, :], because it is already a 3-d array.

所以不是

a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here

尝试

np.subtract(b[:, :, np.newaxis], c, a)

np.subtract的第三个参数是目标数组.

The third argument of np.subtract is the destination array.

这篇关于NumPy中广播操作带来的内存增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆