阵列中的全滤材提高内存的使用,以避免块处理 [英] Improving memory usage in an array-wide filter to avoid block-processing

查看:425
本文介绍了阵列中的全滤材提高内存的使用,以避免块处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我采取了一些卫星图像过滤器,从一个被称为增强Lee滤波。图像是容易达5000x5000像素,等等。我目前的实施运行内存试图计算这些大型阵列的过滤器(注意,均线和移动STDDEV过滤器可以一次性运行)。主要的困难是,必须保持在存储器中,以便返回最终滤波阵列阵列的数目。在<一个href=\"http://stackoverflow.com/questions/4958652/calling-filter-functions-from-within-a-block-by-block-processing-function\">this问题,我要求在精炼块处理功能的帮助,但我的问题是:有没有改进这种code,这样我就不需要使用块的处理方式?

 高清MOVING_AVERAGE(IC,filtsize):
        IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
        scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
        林回    高清moving_stddev(IC,filtsize):
        IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
        scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
        S = numpy.empty(Ic.shape,DTYPE ='浮点32')
        scipy.ndimage.filters.uniform_filter(((IC-林)** 2),filtsize,输出= S)
        返回numpy.sqrt(S)    高清enh_lee(IC,filtsize,nlooks,dfactor):
        基于PCI Geomatica中的FELEE函数文档#实现
        CI = moving_stddev(IC,filtsize)/ MOVING_AVERAGE(IC,filtsize)#在内存中第1个数组
        铜= numpy.sqrt(1 / nlooks)#scalar
        Cmax为numpy.sqrt(1 +(2 * nlooks))#scalar
        W = numpy.exp(-dfactor *(CI - 铜)/(的Cmax - 慈))#第二存储器阵列
        IM = MOVING_AVERAGE(IC,filtsize)在记忆#3阵列
        如果=林* W + IC *(1 - W)#4日在存储器阵列
        W =无#返回内存阵列3
        返回numpy.select([CI&LT; =铜(铜&LT; CI)*(CI&LT;的Cmax),CI&GT; =的Cmax],[IM,如果IC])

其中, nlooks dfactor 是标量和 IC 是未经过滤的数组。

根据您的建议编辑(我也是在看的 numexpr 的),为 enh_lee 我提高code如下,但尚不足以让过去的最后一步不会耗尽内存:

 高清enh_lee(IC,filtsize,nlooks,dfactor):
    IM = MOVING_AVERAGE(IC,filtsize)
    CI = moving_stddev(IC,filtsize)
    CI / =林
    铜= numpy.sqrt(1 / nlooks)
    Cmax为numpy.sqrt(1 +(2 * nlooks))    W =慈
    W - =铜
    W / =的Cmax - 慈
    W * = -dfactor
    numpy.exp(W,W)    如果= 1
    如果 - = W
    如果* = IC
    如果+ =林*宽
    W =无
    返回numpy.select([词&下; =铜,(铜与所述; CI)及(CI&下;的Cmax),慈&GT =的Cmax],[林,如果IC])


解决方案

有几个(内存使用)的优化,你可以在这里做......一些技巧要记住的:


  1. 最numpy的功能需要一个退出参数相比,可以用,而不是返回指定副本的输出数组。例如。 np.sqrt(X,X)将采取就地数组的平方根。

  2. X + = 1 使用一半的内存X = X + 1 确实,因为后者使临时副本。如果有可能,尽量拆分计算为 * = + = / = 等。当它不是,用numexpr,作为@eumiro建议。 (或者只是使用numexpr不管......这在许多情况下很方便。)

所以,首先,这里是你的原函数的一个10000x10000阵列随机数据和 filtsize 3的表现:

原有功能的内存使用简介

有趣的是在最后的大尖峰。这些发生在 numpy.select(...)位期间。有很多在那里你在不经意间创造额外的临时数组,但他们大多是不相关的,因为它们可以通过调用期间发生的事情淹没的地方选择


无论如何,如果我们与此相当冗长的版本替换原来的(清洁和紧凑型)code,可以显著优化内存使用情况:

 进口numpy的
进口scipy.ndimage高清主(X =无):
    如果x为无:
        NI,NJ = 10000,10000
        X = numpy.arange(NI *新泽西州,DTYPE = numpy.float32).reshape(NI,NJ)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    X = enh_lee(X,filtsize,nlooks,dfactor)
    返回X高清MOVING_AVERAGE(IC,filtsize):
    IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
    scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
    林回高清moving_stddev(IC,filtsize):
    IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
    scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
    IM * = -1
    IM + = IC
    林** = 2
    scipy.ndimage.filters.uniform_filter(IM,filtsize,输出=林)
    返回numpy.sqrt(IM,即时通讯)高清enh_lee(IC,filtsize,nlooks,dfactor):
    基于PCI Geomatica中的FELEE函数文档#实现
    CI = moving_stddev(IC,filtsize)
    IM = MOVING_AVERAGE(IC,filtsize)
    CI / =林    铜= numpy.sqrt(1 / nlooks).astype(numpy.float32)
    Cmax为numpy.sqrt(1 +(2 * nlooks))。astype(numpy.float32)    W = Ci.copy()
    W - =铜
    W * = -dfactor
    W / =的Cmax - 慈
    W = numpy.exp(W,W)    如果=林*宽
    W * = -1
    W + = 1
    W * = IC
    如果+ = W
    德尔W¯¯    #替换调用numpy.select
    OUT =如果
    过滤器= Cl&LT; =铜
    numpy.putmask(出,过滤器,即时通讯)
    德尔林    过滤器= Cl&GT;的Cmax =
    numpy.putmask(出,过滤器,IC)
    返回了如果__name__ =='__main__':
    主要()

下面是这个code产生的内存配置文件:

根据numpy的优化版本的内存使用简介

所以,我们已经大大降低内存的使用,但code是稍差可读(i.m.o)。

<击>然而,这些最后三个山峰有两个 numpy.where 通话...

如果 numpy.where 花了一个退出参数,我们可以通过另一个进一步降低峰值内存使用300MB〜或者。不幸的是,这不,我不知道的内存更有效的方式来做到这一点...

我们可以使用 numpy.putmask 来取代调用 numpy.select 和就地做手术(感谢@eumiro的<一个href=\"http://stackoverflow.com/questions/4992040/python-numpy-boolean-array-negation-in-where-statement/4992152#4992152\">mentioning这在一个完全不同的问题。)

如果我们优化事情numexpr,我们得到相当清洁code(相对于纯numpy的版本上面,而不是原来的)。你也许可以惠特尔的内存使用量下来在这个版本有点...我不是非常熟悉numexpr,除了有使用了几次。

 进口numpy的
进口scipy.ndimage
进口numexpr作为NE高清主(X =无):
    如果x为无:
        NI,NJ = 10000,10000
        X = numpy.arange(NI *新泽西州,DTYPE = numpy.float32).reshape(NI,NJ)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    X = enh_lee(X,filtsize,nlooks,dfactor)
    返回X高清MOVING_AVERAGE(IC,filtsize):
    IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
    scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
    林回高清moving_stddev(IC,filtsize):
    IM = numpy.empty(Ic.shape,DTYPE ='浮点32')
    scipy.ndimage.filters.uniform_filter(IC,filtsize,输出=林)
    IM = ne.evaluate('((IC-IM)** 2))
    scipy.ndimage.filters.uniform_filter(IM,filtsize,输出=林)
    返回ne.evaluate('开方(IM))高清enh_lee(IC,filtsize,nlooks,dfactor):
    基于PCI Geomatica中的FELEE函数文档#实现
    CI = moving_stddev(IC,filtsize)
    IM = MOVING_AVERAGE(IC,filtsize)
    CI / =林    铜= numpy.sqrt(1 / nlooks).astype(numpy.float32)
    Cmax为numpy.sqrt(1 +(2 * nlooks))。astype(numpy.float32)    W = ne.evaluate('EXP(-dfactor *(CI - 铜)/(的Cmax - 慈)))
    如果= ne.evaluate('林* W + IC *(1 - W))
    德尔W¯¯    OUT = ne.evaluate(里(CI&LT; =铜,IM,若))
    德尔林
    德尔如果    OUT = ne.evaluate(里(CI&GT; =的Cmax,IC,出来)')
    返回了如果__name__ =='__main__':
    主要()

和这里的numexpr版本内存使用概况:(注意,执行时间已经减少了一半以上相比原来的!)

根据Numexpr优化版本的内存使用配置文件*

最大的内存使用量仍然是调用中,其中(替换调用选择)。然而,峰值内存使用量已显著下调。进一步降低它的最简单方法是找到某种方式来选择就地在阵列中的一个操作。这将是很容易做到这一点与用Cython(嵌套循环将是纯​​Python相当缓慢,任何种类的numpy的布尔索引将创建一个的其他的复印件)。您可能通过简单地分块输入数组更好,你一直在做,但...

就像是侧面说明,更新版本产生输出作为原始code相同。有没有在原来的基础numpy的-code错字...

I am implementing some satellite image filters, starting with one known as the Enhanced Lee filter. The images are easily up to 5000x5000 pixels and more. My current implementation is running out of memory trying to compute the filters on those large arrays (note that the moving average and moving stddev filters can be run in one shot). The main difficulty is the number of arrays that must be kept in memory in order to return the final filtered array. In this question, I asked for help on refining a block-processing function, but my question is: is there a way of improving this code so that I don't need to use block-processing?

    def moving_average(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        return Im

    def moving_stddev(Ic, filtsize):
        Im = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
        S = numpy.empty(Ic.shape, dtype='Float32')
        scipy.ndimage.filters.uniform_filter(((Ic-Im) ** 2), filtsize, output=S)
        return numpy.sqrt(S)

    def enh_lee(Ic, filtsize, nlooks, dfactor):
        # Implementation based on PCI Geomatica's FELEE function documentation
        Ci = moving_stddev(Ic, filtsize) / moving_average(Ic, filtsize) #1st array in memory
        Cu = numpy.sqrt(1 / nlooks) #scalar
        Cmax = numpy.sqrt(1 + (2 * nlooks)) #scalar
        W = numpy.exp(-dfactor * (Ci - Cu) / (Cmax - Ci)) #2nd array in memory
        Im = moving_average(Ic, filtsize) #3rd array in memory
        If = Im * W + Ic * (1 - W) #4th array in memory
        W = None # Back to 3 arrays in memory
        return numpy.select([Ci <= Cu, (Cu < Ci) * (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])

where nlooks and dfactor are scalars and Ic is the unfiltered array.

EDIT based on your suggestions (I am also looking at numexpr), my improved code for enh_lee is as follows, but is still not enough to get past the last step without running out of memory:

def enh_lee(Ic, filtsize, nlooks, dfactor):
    Im = moving_average(Ic, filtsize)
    Ci = moving_stddev(Ic, filtsize)
    Ci /= Im
    Cu = numpy.sqrt(1 / nlooks)
    Cmax = numpy.sqrt(1 + (2 * nlooks))

    W = Ci
    W -= Cu
    W /= Cmax - Ci
    W *= -dfactor
    numpy.exp(W, W)

    If = 1
    If -= W
    If *= Ic
    If += Im * W
    W = None
    return numpy.select([Ci <= Cu, (Cu < Ci) & (Ci < Cmax), Ci >= Cmax], [Im, If, Ic])

解决方案

There are several (memory usage) optimizations you can make here... A few tricks to keep in mind are:

  1. Most numpy functions take an out parameter than can be used to specify the output array instead of returning a copy. E.g. np.sqrt(x, x) will take the square root of an array in-place.
  2. x += 1 uses half the memory that x = x + 1 does, as the latter makes a temporary copy. When it's possible, try to split calculations up into *=, +=, /=, etc. When it's not, use numexpr, as @eumiro suggested. (Or just used numexpr regardless... It's quite handy in many cases.)

So, first off, here's your original function's performance with a 10000x10000 array of random data and a filtsize of 3:

Memory Usage Profile of Original Function

What's interesting are the big spikes at the end. These occur during your numpy.select(...) bit. There a plenty of places where you're inadvertently creating additional temporary arrays, but they're mostly irrelevant, as they're overwhelmed by what goes on during the call to select.


At any rate, if we replace your original (clean and compact) code with this rather verbose version, you can significantly optimize your memory usage:

import numpy
import scipy.ndimage

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im *= -1
    Im += Ic
    Im **= 2
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return numpy.sqrt(Im, Im)

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = Ci.copy()
    W -= Cu
    W *= -dfactor
    W /= Cmax - Ci
    W = numpy.exp(W, W)

    If = Im * W
    W *= -1
    W += 1
    W *= Ic
    If += W
    del W

    # Replace the call to numpy.select
    out = If
    filter = Ci <= Cu
    numpy.putmask(out, filter, Im)
    del Im

    filter = Ci >= Cmax
    numpy.putmask(out, filter, Ic)
    return out

if __name__ == '__main__':
    main()

Here's the resulting memory profile for this code:

Memory Usage Profile of Numpy-based Optimized Version

So, we've greatly reduced memory usage, but the code is somewhat less readable (i.m.o.).

However, those last three peaks are the two numpy.where calls...

If numpy.where took an out parameter, we could further reduce the peak memory usage by another ~300Mb or so. Unfortunately, it doesn't, and I don't know of a more memory-efficient way to do it...

We can use numpy.putmask to replace the call to numpy.select and do the operation in-place (Thanks to @eumiro for mentioning this in an entirely different question.)

If we optimize things with numexpr, we get considerably cleaner code (compared to the pure-numpy version above, not the original). You could probably whittle the memory usage down a bit in this version... I'm not terribly familiar with numexpr, beyond having used it a few times.

import numpy
import scipy.ndimage
import numexpr as ne

def main(x=None):
    if x is None:
        ni, nj = 10000, 10000
        x = numpy.arange(ni*nj, dtype=numpy.float32).reshape(ni,nj)
    filtsize = 3
    nlooks = 10.0
    dfactor = 10.0
    x = enh_lee(x, filtsize, nlooks, dfactor)
    return x

def moving_average(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    return Im

def moving_stddev(Ic, filtsize):
    Im = numpy.empty(Ic.shape, dtype='Float32')
    scipy.ndimage.filters.uniform_filter(Ic, filtsize, output=Im)
    Im = ne.evaluate('((Ic-Im) ** 2)')
    scipy.ndimage.filters.uniform_filter(Im, filtsize, output=Im)
    return ne.evaluate('sqrt(Im)')

def enh_lee(Ic, filtsize, nlooks, dfactor):
    # Implementation based on PCI Geomatica's FELEE function documentation
    Ci = moving_stddev(Ic, filtsize) 
    Im = moving_average(Ic, filtsize) 
    Ci /= Im

    Cu = numpy.sqrt(1 / nlooks).astype(numpy.float32) 
    Cmax = numpy.sqrt(1 + (2 * nlooks)).astype(numpy.float32) 

    W = ne.evaluate('exp(-dfactor * (Ci - Cu) / (Cmax - Ci))') 
    If = ne.evaluate('Im * W + Ic * (1 - W)') 
    del W

    out = ne.evaluate('where(Ci <= Cu, Im, If)') 
    del Im
    del If

    out = ne.evaluate('where(Ci >= Cmax, Ic, out)')
    return out

if __name__ == '__main__':
    main()

And here's the memory-usage profile for the numexpr version: (Note that the execution time has been more than halved compared to the original!)

Memory Usage Profile of Numexpr-based Optimized Version*

The largest memory usage is still during the calls to where (replacing the call to select). However, the peak memory usage has been significantly cut. The easiest way to further reduce it would be to find some way to select operate in-place on one of the arrays. It would be fairly easy to do this with cython (nested loops would be rather slow in pure python, and any sort of boolean indexing in numpy will create an additional copy). You may be better off by simply chunking the input array as you've been doing, though...

Just as as side note, the updated versions produce the same output as the original code. There was a typo in the original numpy-based code...

这篇关于阵列中的全滤材提高内存的使用,以避免块处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆