标准偏差的NumPy函数的内存消耗 [英] Memory consumption of NumPy function for standard deviation

查看:126
本文介绍了标准偏差的NumPy函数的内存消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用GDAL的Python绑定来处理相当大的栅格数据集(> 4 GB).因为对我而言,将它们一次加载到内存中不是可行的解决方案,所以我将它们读取到较小的块中,然后逐个进行计算.为了避免对每个块读取进行新分配,我使用了buf_obj参数(这里)将值读入预先分配的NumPy数组中.一方面,我必须计算整个栅格的均值和标准差.自然,我已经使用np.std进行了计算.但是,通过对程序的内存消耗进行概要分析,我意识到,每次调用np.std时,都会另外分配和释​​放内存.

I'm currently using the Python bindings of GDAL to work on quite large raster data sets (> 4 GB). Since loading them into memory at once is no feasible solution for me I read them into smaller blocks and do the computations piece by piece. To avoid a new allocation for every block read I'm using the buf_obj argument (here) to read the values into an preallocated NumPy array. At one point I have to compute the mean and standard deviation of the entire raster. Naturally I've used np.std for the computation. However by profiling the memory consumption of my program I realized that with each invocation of np.std additionally memory is allocated and released.

展示此行为的最小工作示例:

A minimum working example which demonstrates this behavior:

In [1]  import numpy as np
In [2]  a = np.random.rand(20e6)  # Approx. 150 MiB of memory
In [3]  %memit np.mean(a)
peak memory: 187.30 MiB, increment: 0.48 MiB
In [4]  %memit np.std(a)
peak memory: 340.24 MiB, increment: 152.91 MiB

在GitHub上NumPy的源代码树中进行的搜索显示,np.std函数在内部从_methods.py调用_var函数(

A search within the source tree of NumPy on GitHub revealed that the np.std function internally invokes the _var function from _methods.py (here). At one point _var computes the deviations from the mean and sums them up. Therefore an temporary copy of the input array is created. The function essentially computes the standard deviation as follows:

mu = sum(arr) / len(arr)
tmp = arr - mu
tmp = tmp * tmp
sd = np.sum(tmp) / len(arr)

虽然这种方法对于较小的输入数组是可以的,但绝对没有办法使用较大的输入数组.由于我正在使用较小的内存块(如前所述),从我的程序中的内存角度来看,这个额外的副本不是一个破坏游戏的问题.但是,令我感到困扰的是,对于每个块,在读取下一个块之前都会进行新的分配并释放.

While this approach is OK for smaller input arrays it's definitely no way to go for larger ones. Since I'm using smaller blocks of memory as mentioned before this additional copy is not an game-breaking issue from the memory point of view in my program. However what bugs me is that for each block a new allocation is made and released before reading the next block.

NumPy或SciPy中是否存在其他一些功能,该功能利用内存消耗恒定的方法,例如Welford算法(

Is there some other function within NumPy or SciPy which utilizes an aproach with constant memory consumption like the Welford algorithm (Wikipedia) for one pass computation of the mean and standard deviation?

另一种可行的方法是使用预分配缓冲区(例如NumPy ufuncs)的可选out参数实现_var函数的自定义版本.使用这种方法不会消除额外的副本,但是至少内存消耗将是恒定的,并且可以节省每个块中分配的运行时间.

Another way to go would be to implement a custom version of the _var function with an optional out argument for a preallocated buffer (like NumPy ufuncs). With this approach the additional copy would not be eliminated but at least the memory consumption would be constant and runtime for the allocations in each block is saved.

编辑:测试了Kezzos建议的Welford算法的Cython实现.

Tested the Cython implementation of the Welford algorithm as suggested by kezzos.

Cython实现(从kezzos修改):

Cython implementation (modified from kezzos):

cimport cython
cimport numpy as np
from libc.math cimport sqrt

@cython.boundscheck(False)
def iterative_approach(np.ndarray[np.float32_t, ndim=1] a):
    cdef long n = 0
    cdef float mean = 0
    cdef float M2 = 0
    cdef long i
    cdef float delta
    cdef float a_min = 10000000  # Must be set to Inf and -Inf for real cases
    cdef float a_max = -10000000
    for i in range(len(a)):
        n += 1
        delta = a[i] - mean
        mean += delta / n
        M2 += delta * (a[i] - mean)
        if a[i] < a_min:
            a_min = a[i]
        if a[i] > a_max:
            a_max = a[i]
    return a_min, a_max, mean, sqrt(M2 / (n - 1))

NumPy实现(均值和标准差可以在一个函数中计算):

NumPy implementation (mean and std can possibly be computed in one function):

def vector_approach(a):
    return np.min(a), np.max(a), np.mean(a), np.std(a, ddof=1)

使用随机数据集(时间以毫秒为单位,最好为25)测试结果:

Test results using a random data set (times in milliseconds, best of 25):

----------------------------------
| Size |  Iterative |     Vector |
----------------------------------
|  1e2 |    0.00529 |    0.17149 |
|  1e3 |    0.02027 |    0.16856 |
|  1e4 |    0.17850 |    0.23069 |
|  1e5 |    1.93980 |    0.77727 |
|  1e6 |   18.78207 |    8.83245 |
|  1e7 |  180.04069 |  101.14722 |
|  1e8 | 1789.60228 | 1086.66737 |
----------------------------------

对于较小的数据集,使用Cython的迭代方法似乎更快,而对于具有10000多个元素的较大数据集,NumPy矢量(可能为SIMD加速)方法似乎更快.所有测试均使用Python 2.7.9和NumPy 1.9.2版进行.

It seems like the iterative approach using Cython is faster with smaller data sets and the NumPy vector (possibly SIMD accelerated) approach for larger data sets with 10000+ elements. All tests were conducted with Python 2.7.9 and NumPy version 1.9.2.

请注意,在实际情况下,上层函数将用于计算单个栅格块的统计信息.所有块的标准偏差和均值应与Wikipedia(此处 ).这样做的好处是,并非所有栅格元素都需要求和,从而避免了浮点溢出问题(至少在某种程度上).

Note that in the real case to upper functions would be used to compute the statistics for a single block of the raster. The standard deviations and means for all block are to be combined with methodology suggested in Wikipedia (here). It has the advantage that not all elements of the raster need to be summed up and thereby avoids the float overflow problem (at least to some point).

推荐答案

我怀疑您会在numpy中找到任何此类功能. numpy的存在理由是它利用了矢量处理器指令集-对大量数据执行相同的指令.基本上numpy会将内存效率与速度效率进行权衡.但是,由于Python占用大量内存,因此numpy还可以通过将数据类型与整个数组而不是每个单独的元素相关联来实现一定的内存效率.

I doubt you will find any such functions in numpy. The raison d'être of numpy is that it takes advantage of vector processor instruction sets -- performing the same instruction of large amounts of data. Basically numpy trades memory efficiency for speed efficiency. However, due to the memory intensive nature of Python, numpy is also able to achieve certain memory efficiencies by associating the data type with the array as a whole and not each individual element.

一种提高速度,但仍会牺牲一些内存开销的方法是按块计算标准偏差,例如.

One way to improve the speed, but still sacrifice some memory overhead is calculate the standard deviation in chunks eg.

import numpy as np

def std(arr, blocksize=1000000):
    """Written for py3, change range to xrange for py2.
    This implementation requires the entire array in memory, but it shows how you can
    calculate the standard deviation in a piecemeal way.
    """
    num_blocks, remainder = divmod(len(arr), blocksize)
    mean = arr.mean()
    tmp = np.empty(blocksize, dtype=float)
    total_squares = 0
    for start in range(0, blocksize*num_blocks, blocksize):
        # get a view of the data we want -- views do not "own" the data they point to
        # -- they have minimal memory overhead
        view = arr[start:start+blocksize]
        # # inplace operations prevent a new array from being created
        np.subtract(view, mean, out=tmp)
        tmp *= tmp
        total_squares += tmp.sum()
    if remainder:
        # len(arr) % blocksize != 0 and need process last part of array
        # create copy of view, with the smallest amount of new memory allocation possible
        # -- one more array *view*
        view = arr[-remainder:]
        tmp = tmp[-remainder:]
        np.subtract(view, mean, out=tmp)
        tmp *= tmp
        total_squares += tmp.sum()

    var = total_squares / len(arr)
    sd = var ** 0.5
    return sd

a = np.arange(20e6)
assert np.isclose(np.std(a), std(a))

显示速度--blocksize越大,速度越大.并且显着降低了内存开销.较低的内存开销并非完全可以100%准确.

Showing the speed up --- the larger the blocksize, the larger the speed up. And considerably lower memory overhead. Not entirely the lower memory overhead is 100% accurate.

In [70]: %timeit np.std(a)
10 loops, best of 3: 105 ms per loop

In [71]: %timeit std(a, blocksize=4096)
10 loops, best of 3: 160 ms per loop

In [72]: %timeit std(a, blocksize=1000000)
10 loops, best of 3: 105 ms per loop

In [73]: %memit std(a, blocksize=4096)
peak memory: 360.11 MiB, increment: 0.00 MiB

In [74]: %memit std(a, blocksize=1000000)
peak memory: 360.11 MiB, increment: 0.00 MiB

In [75]: %memit np.std(a)
peak memory: 512.70 MiB, increment: 152.59 MiB

这篇关于标准偏差的NumPy函数的内存消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆