将嵌套循环计算转换为Numpy以加速 [英] Converting a nested loop calculation to Numpy for speedup

查看:142
本文介绍了将嵌套循环计算转换为Numpy以加速的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Python程序的一部分包含了下面的一段代码,其中一个新的网格
是根据在旧网格中找到的数据计算出来的。

网格ia浮标的二维列表。代码使用三个for循环:

pre $ 对于xrange(0,t,step)中的t:
for h在xrange(1,height-1)中:
for xrange(1,width-1):
new_gr [h] [w] = gr [h] [w] + gr [h] [w-1] + gr [h-1] [w] + t * gr [h + 1] [w-1] -2 *(gr [h] [w- ] [w])
gr = new_gr

return gr

代码对于一个大的网格和一个很长的时间极其缓慢。



我试过用Numpy来加速这个代码,通过用内部循环
替换:

$ $ $ $ $ $ $ $ $ J $ np.arange(1,width-1)
new_gr [h] [j] = gr [h] [j] + gr [h] [J-1] ...

但生成的结果(数组中的浮点数)比
小了约10%。 >

  • 使用 np.array(pylist)将浮点数列表转换为浮点数数组,然后进行计算时,会出现什么样的精度损失? / p>


  • 我应该如何转换一个三重for循环到漂亮和快速的Numpy代码? (或有其他建议,以加快代码显着?)

  • >如果 gr 是一个浮点数列表,那么如果你想用NumPy进行矢量化,第一步就是将 gr 转换为一个NumPy数组,其中包含 np.array ()



    接下来,我假定你有 new_gr 用零形状初始化高度,宽度)。在两个最里面的循环中执行的计算基本上代表 2D卷积。因此,您可以使用 signal.convolve2d 与适当的内核。要决定内核,我们需要查看缩放因子,并从它们中取出一个 3 x 3 内核并否定它们来模拟我们在每次迭代中所做的计算。因此,你将有一个矢量化的解决方案,为了获得更好的性能,最后的两个循环被移除了,像这样 -

      import numpy as np 
    从scipy导入信号

    #获取缩放因子并取消它们以获得内核
    kernel = -np.array([[0,1-2 * t,0] ,[ - 1,1,0,],[t,0,0]])

    #初始化输出数组并运行2D卷积并设置值
    out = np.zeros ((height,width))
    out [1:-1,1:-1] = signal.convolve2d(gr,kernel,mode ='same')[1:-1,: - 2]

    验证输出和运行时测试

    定义函数:

      def org_app(gr,t):
    new_gr = np.zeros(( (1,height-1):
    for x inrange(1,width-1):
    new_gr [h] [w] = gr [h] [w-1] + gr [h-1] [w] + t * gr [h + 1] [w-1] -2 *(gr [h] [w -1] + t * gr [h-1] [w])
    return new_gr

    def suggested_app(gr,t):
    kern el = -np.array([[0,1-2 * t,0],[ - 1,1,0,],[t,0,0]])
    out = np.zeros(( (1:-1,1:-1)= signal.convolve2d(gr,kernel,mode ='same')[1:-1,: - 2]
    退出

    确认 -

     在[244]中:#输入
    ...:gr = np.random.rand(40,50)
    ...:height,width = gr.shape
    ...:t = 1
    ...:

    In [245]:np.allclose(org_app(gr,t),proposed_app(gr,t))
    出[245]:真

    定时 -

    在[246]中:#输入
    ...:gr = np.random.rand 400,500)
    ...:height,width = gr.shape
    ...:t = 1
    ...:

    在[247]:% timeit org_app(gr,t)
    1个循环,每个循环最好是3:2.13 s

    在[248]中:%timeit proposed_app(gr,t)
    10个循环,最好的3:每循环19.4毫秒


    Part of my Python program contains the follow piece of code, where a new grid is calculated based on data found in the old grid.

    The grid i a two-dimensional list of floats. The code uses three for-loops:

    for t in xrange(0, t, step):
        for h in xrange(1, height-1):
            for w in xrange(1, width-1):
                new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
        gr = new_gr
    
    return gr
    

    The code is extremly slow for a large grid and a large time t.

    I've tried to use Numpy to speed up this code, by substituting the inner loop with:

    J = np.arange(1, width-1)
    new_gr[h][J] = gr[h][J] + gr[h][J-1] ...
    

    But the results produced (the floats in the array) are about 10% smaller than their list-calculation counterparts.

    • What loss of accuracy is to be expected when converting lists of floats to Numpy array of floats using np.array(pylist) and then doing a calculation?

    • How should I go about converting a triple for-loop to pretty and fast Numpy code? (or are there other suggestions for speeding up the code significantly?)

    解决方案

    If gr is a list of floats, the first step if you are looking to vectorize with NumPy would be to convert gr to a NumPy array with np.array().

    Next up, I am assuming that you have new_gr initialized with zeros of shape (height,width). The calculations being performed in the two innermost loops basically represent 2D convolution. So, you can use signal.convolve2d with an appropriate kernel. To decide on the kernel, we need to look at the scaling factors and make a 3 x 3 kernel out of them and negate them to simulate the calculations we are doing with each iteration. Thus, you would have a vectorized solution with the two innermost loops being removed for better performance, like so -

    import numpy as np
    from scipy import signal
    
    # Get the scaling factors and negate them to get kernel
    kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
    
    # Initialize output array and run 2D convolution and set values into it
    out = np.zeros((height,width))
    out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
    

    Verify output and runtime tests

    Define functions :

    def org_app(gr,t):
        new_gr = np.zeros((height,width))
        for h in xrange(1, height-1):
            for w in xrange(1, width-1):
                new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w]) 
        return new_gr
    
    def proposed_app(gr,t):
        kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
        out = np.zeros((height,width))
        out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
        return out
    

    Verify -

    In [244]: # Inputs
         ...: gr = np.random.rand(40,50)
         ...: height,width = gr.shape
         ...: t = 1
         ...: 
    
    In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t))
    Out[245]: True
    
    Timings -
    
    In [246]: # Inputs
         ...: gr = np.random.rand(400,500)
         ...: height,width = gr.shape
         ...: t = 1
         ...: 
    
    In [247]: %timeit org_app(gr,t)
    1 loops, best of 3: 2.13 s per loop
    
    In [248]: %timeit proposed_app(gr,t)
    10 loops, best of 3: 19.4 ms per loop
    

    这篇关于将嵌套循环计算转换为Numpy以加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆