在numpy数组上向量化python循环 [英] Vectorize a python loop over a numpy array

查看:112
本文介绍了在numpy数组上向量化python循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要加快此循环的处理速度,因为它非常慢.但是我不知道如何对其向量化,因为一个值的结果取决于前一个值的结果.有什么建议吗?

I need to speed up the processing of this loop as it is very slow. But I don't know how to vectorize it since the result of one value depends on the result of a previous value. Any suggestions?

import numpy as np

sig = np.random.randn(44100)
alpha = .9887
beta = .999

out = np.zeros_like(sig)

for n in range(1, len(sig)):
    if np.abs(sig[n]) >= out[n-1]:
        out[n] = alpha * out[n-1] + (1 - alpha) * np.abs( sig[n] )
    else:
        out[n] = beta * out[n-1]

推荐答案

前向依赖循环"代码上的矢量化潜力低

一旦分析了依存关系,大多数向量化"并行性就不存在了. (JIT编译器也不能向量化对抗"这种依赖障碍)

Low vectorisation potential on a "forward-dependent-loop" code

majority of your "vectorisation" parallelism is out of the game, once the dependency is analysed. ( JIT-compiler cannot vectorise "against" such dependence barrier either )

您可以以向量化的方式预先计算一些重用的值,但是没有直接的python语法方式(没有外部JIT编译器解决方法)来将前向依存关系循环计算安排到CPU向量寄存器中对齐的并行计算:

you may pre-calculate some re-used values in a vectorised manner, but there is no direct python syntax manner ( without an external JIT-compiler workaround ) to arrange forward-shifting-dependence loop computation into your CPU vector-register aligned co-parallel computation:

from zmq import Stopwatch    # ok to use pyzmq 2.11 for [usec] .Stopwatch()
aStopWATCH =    Stopwatch()  # a performance measurement .Stopwatch() instance

sig    = np.abs(sig)         # self-destructive calc/assign avoids memalloc-OPs
aConst = ( 1 - alpha )       # avoids many repetitive SUB(s) in the loop

for thisPtr in range( 1, len( sig ) ): # FORWARD-SHIFTING-DEPENDENCE LOOP:
    prevPtr = thisPtr - 1              # prevPtr->"previous" TimeSlice in out[] ( re-used 2 x len(sig) times )
    if sig[thisPtr] < out[prevPtr]:                                    # 1st re-use
       out[thisPtr] = out[prevPtr] * beta                              # 2nd
    else:
       out[thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] ) # 2nd

在某些情况下,可以看到矢量化加速的一个很好的例子,其中可以沿着本机numpy数组的1D,2D甚至3D结构并行/广播计算策略.要获得约100倍的加速,请在用于PNG图片处理的矢量化代码(OpenGL着色器管道)中查看RGBA-2D矩阵加速处理

A good example of vectorised speed-up can be seen in cases, where calculation strategy can be parallelised/broadcast along 1D, 2D or even 3D structure of the native numpy array. For a speedup of about 100x see an RGBA-2D matrix accelerated processing in Vectorised code for a PNG picture processing ( an OpenGL shader pipeline)

即使这个简单的python代码修订版也将速度提高了约2.8倍以上(目前,即,无需进行安装即可使用临时的JIT优化编译器):

Even this simple python code revision has increased the speed more than about 2.8x times ( right now, i.e. without undertaking an installation to allow using an ad-hoc JIT-optimising compiler ):

>>> def aForwardShiftingDependenceLOOP(): # proposed code-revision
...     aStopWATCH.start()                # ||||||||||||||||||.start
...     for thisPtr in range( 1, len( sig ) ):
...         #        |vvvvvvv|------------# FORWARD-SHIFTING-LOOP DEPENDENCE
...         prevPtr = thisPtr - 1  #|vvvvvvv|--STEP-SHIFTING avoids Numpy syntax
...         if ( sig[ thisPtr] < out[prevPtr] ):
...             out[  thisPtr] = out[prevPtr] * beta
...         else:
...             out[  thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] )
...     usec = aStopWATCH.stop()          # ||||||||||||||||||.stop
...     print usec, " [usec]"

>>> aForwardShiftingDependenceLOOP()
57593  [usec]
57879  [usec]
58085  [usec]

>>> def anOriginalForLOOP():
...     aStopWATCH.start()
...     for n in range( 1, len( sig ) ):
...         if ( np.abs( sig[n] ) >= out[n-1] ):
...             out[n] = out[n-1] * alpha + ( 1 - alpha ) * np.abs( sig[n] )
...         else:
...             out[n] = out[n-1] * beta
...     usec = aStopWATCH.stop()
...     print usec, " [usec]"

>>> anOriginalForLOOP()
164907  [usec]
165674  [usec]
165154  [usec]

这篇关于在numpy数组上向量化python循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆