numpy set数组内存 [英] Numpy set array memory

查看:161
本文介绍了numpy set数组内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对numpys内存视图有疑问:

I have a question regarding numpys memory views:

假设我们有两个带有内存的数组:

Suppose we have two arrays with memory:

import numpy as np
import gc
x = np.arange(4*3).reshape(4,3).astype(float)
y = (np.arange(5) - 5).astype(float)
y_ref = y

我们在框架中使用这些(xy),因此我们不能仅仅重新定义它们,因为用户可能已经为他们自己链接了它们(如在y_ref中).现在,我们要在一个视图中合并它们的内存.因此,单个视图(例如p)与两个数组共享内存.

We use these (x, y) in a framework, such that we cannot just redefine them, as the user may have linked them for himself (as in y_ref). Now we want to combine their memory in one view. So, that the single view, say p shares the memory with both arrays.

我是通过以下方式完成的,但是不知道这是否会导致内存泄漏:

I did it in the following way, but do not know if this causes a memory leak:

p = np.empty(x.size+y.size, dtype=float) # create new memory block with right size
c = 0 # current point in memory

# x
p[c:c+x.size].flat = x.flat # set the memory for combined array p
x.data = p[c:c+x.size].data # now set the buffer of x to be the right length buffer of p

c += x.size

# y
p[c:c+y.size].flat = y.flat # set the memory for combined array p
y.data = p[c:c+y.size].data # and set the buffer of x to be the right length buffer of p

因此,我们现在可以在单个视图p或任何一个数组上进行操作,而不必重新定义对其的每个引用

Thus, we can now operate on the single view p or either of the arrays, without having to redifine every single reference to them

x[3] = 10
print p[3*3:4*3]
# [ 10.  10.  10.]

甚至y_ref都有更新:

print y[0] # -5
y_ref[0] = 100
print p[x.size] # 100

这是将数组的内存设置为另一个数组的视图的正确方法吗?

Is this the correct way of setting the memory of an array to be a view into another array?

是否有一种明显的方式来统一数组的存储,而我却很想念?

Is there an obvious way of unifying the memory of arrays, which I am blatantly missing?

我不确定xy的旧数据缓冲区会发生什么,因为它们现在超出范围了.他们会被释放吗?

I am not sure what will happen with the old data buffers of x and y as they are out of scope now. Will they get deallocated?

更新谢谢@Jaime:

p.size在我要应用于(微生物学)的数据集上会变得非常大(数十亿).另外,此主题将在具有潜在深度结构的框架中使用,因此更新所有本地版本可能会变得昂贵.所有参数的更新都需要在优化循环中完成,因此将所有内容都存储在内存中至关重要.

p.size can get very large (into billions) on datasets I am applying to (microbiology). Also, this theme gets used in a framework with potentially deep structures, so updating all local versions can get expensive. Updating of all parameters need to be done in an optimization loop, so it is crucial to have everything in memory.

实际上,您的方法首先是我的方法,因为使用python层次结构遍历来更新所有本地副本效率不高.

Actually your approach was what I came from in the first place, as it was inefficient using python hierarchy traversals to update all local copies.

推荐答案

根据源代码,旧的数据缓冲区将被释放.

According to the source code, the old data buffer will be freed.

https://github.com com/numpy/numpy/blob/6c6ddaf62e0556919a57d510e13ccb2e6cd6e043/numpy/core/src/multiarray/getset.c#L329

但是如果旧缓冲区被其他数组引用,则会导致问题:

but if the old buffer is referenced by other array, it will cause problem:

import numpy as np

a = np.zeros(10)
b = np.zeros(10)
c = a[:]
a.data = b
print c

这篇关于numpy set数组内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆