内存中的numpy跨步数组/广播数组的大小? [英] Size of numpy strided array/broadcast array in memory?

查看:71
本文介绍了内存中的numpy跨步数组/广播数组的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以numpy创建有效的广播数组,例如一组只有1000个元素但重复1e6次的shape=[1000,1000,1000]数组.可以通过np.lib.stride_tricks.as_stridednp.broadcast_arrays来实现.

I'm trying to create efficient broadcast arrays in numpy, e.g. a set of shape=[1000,1000,1000] arrays that have only 1000 elements, but repeated 1e6 times. This can be achieved both through np.lib.stride_tricks.as_strided and np.broadcast_arrays.

但是,我在验证内存中是否没有重复项时遇到了麻烦,这很关键,因为实际复制内存中阵列的测试往往会使我的机器崩溃,而没有回溯.

However, I am having trouble verifying that there is no duplication in memory, and this is critical since tests that actually duplicate the arrays in memory tend to crash my machine leaving no traceback.

我尝试使用.nbytes检查数组的大小,但这似乎与实际的内存使用情况不符:

I've tried examining the size of the arrays using .nbytes, but that doesn't seem to correspond to the actual memory usage:

>>> import numpy as np
>>> import resource
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> pagesize = resource.getpagesize()
>>>
>>> x = np.arange(1000)
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of x = {0} MB".format(x.nbytes/1e6))
Size of x = 0.008 MB
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6))
Memory used = 150.994944 MB
>>>
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0))
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of y = {0} MB".format(y.nbytes/1e6))
Size of y = 0.8 MB
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6))
Memory used = 201.326592 MB
>>>
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
>>> print("Size of z = {0} MB".format(z.nbytes/1e6))
Size of z = 80.0 MB
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6))
Memory used = 0.0 MB

因此,.nbytes报告数组的理论"大小,但显然不是实际大小. resource检查有点尴尬,因为看起来好像有一些东西要载入&缓存(可能是?)会导致第一个步骤占用一些内存,但是以后的步骤将不占用任何内存.

So .nbytes reports the "theoretical" size of the array, but apparently not the actual size. The resource checking is a little awkward, as it looks like there are some things being loaded & cached (perhaps?) that result in the first striding taking up some amount of memory, but future strides take none.

tl; dr:如何确定内存中的numpy数组或数组视图的实际大小?

tl;dr: How do you determine the actual size of a numpy array or array view in memory?

推荐答案

一种方法是检查

One way would be to examine the .base attribute of the array, which references the object from which an array "borrows" its memory. For example:

x = np.arange(1000)
print(x.flags.owndata)      # x "owns" its data
# True
print(x.base is None)       # its base is therefore 'None'
# True

a = x.reshape(100, 10)      # a is a reshaped view onto x
print(a.flags.owndata)      # it therefore "borrows" its data
# False
print(a.base is x)          # its .base is x
# True

使用np.lib.stride_tricks,事情会稍微复杂一些:

Things are slightly more complicated with np.lib.stride_tricks:

b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0))

print(b.flags.owndata)
# False
print(b.base)   
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0>

在这里,b.base是一个numpy.lib.stride_tricks.DummyArray实例,如下所示:

Here, b.base is a numpy.lib.stride_tricks.DummyArray instance, which looks like this:

class DummyArray(object):
    """Dummy object that just exists to hang __array_interface__ dictionaries
    and possibly keep alive a reference to a base array.
    """

    def __init__(self, interface, base=None):
        self.__array_interface__ = interface
        self.base = base

因此,我们可以检查b.base.base:

print(b.base.base is x)
# True

一旦有了基本数组,则其.nbytes属性应准确反映其占用的内存量.

Once you have the base array then its .nbytes attribute should accurately reflect the amount of memory it occupies.

原则上,可以有一个数组视图的视图,也可以从另一个跨步数组创建跨步数组.假设您的视图或跨步数组最终得到另一个numpy数组的支持,则可以递归引用其.base属性.找到.baseNone的对象后,就找到了要从中借用其内存的基础对象:

In principle it's possible to have a view of a view of an array, or to create a strided array from another strided array. Assuming that your view or strided array is ultimately backed by another numpy array, you could recursively reference its .base attribute. Once you find an object whose .base is None, you have found the underlying object from which your array is borrowing its memory:

def find_base_nbytes(obj):
    if obj.base is not None:
        return find_base_nbytes(obj.base)
    return obj.nbytes

按预期,

print(find_base_nbytes(x))
# 8000

print(find_base_nbytes(y))
# 8000

print(find_base_nbytes(z))
# 8000

这篇关于内存中的numpy跨步数组/广播数组的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆