将嵌套循环计算转换为Numpy以加速 [英] Converting a nested loop calculation to Numpy for speedup
问题描述
我的Python程序的一部分包含了下面的一段代码,其中一个新的网格
是根据在旧网格中找到的数据计算出来的。
网格ia浮标的二维列表。代码使用三个for循环:
pre $ 对于xrange(0,t,step)中的t:
for h在xrange(1,height-1)中:
for xrange(1,width-1):
new_gr [h] [w] = gr [h] [w] + gr [h] [w-1] + gr [h-1] [w] + t * gr [h + 1] [w-1] -2 *(gr [h] [w- ] [w])
gr = new_gr
return gr
代码对于一个大的网格和一个很长的时间极其缓慢。
我试过用Numpy来加速这个代码,通过用内部循环
替换:
$ $ $ $ $ $ $ $ $ J $ np.arange(1,width-1)
new_gr [h] [j] = gr [h] [j] + gr [h] [J-1] ...
但生成的结果(数组中的浮点数)比
小了约10%。 >
使用 np.array(pylist)将浮点数列表转换为浮点数数组,然后进行计算时,会出现什么样的精度损失? / p>
我应该如何转换一个三重for循环到漂亮和快速的Numpy代码? (或有其他建议,以加快代码显着?)
gr
是一个浮点数列表,那么如果你想用NumPy进行矢量化,第一步就是将 gr
转换为一个NumPy数组,其中包含 np.array ()
。 接下来,我假定你有 new_gr
用零形状初始化高度,宽度)
。在两个最里面的循环中执行的计算基本上代表 2D卷积
。因此,您可以使用 signal.convolve2d
与适当的内核
。要决定内核
,我们需要查看缩放因子,并从它们中取出一个 3 x 3
内核并否定它们来模拟我们在每次迭代中所做的计算。因此,你将有一个矢量化的解决方案,为了获得更好的性能,最后的两个循环被移除了,像这样 -
import numpy as np
从scipy导入信号
#获取缩放因子并取消它们以获得内核
kernel = -np.array([[0,1-2 * t,0] ,[ - 1,1,0,],[t,0,0]])
#初始化输出数组并运行2D卷积并设置值
out = np.zeros ((height,width))
out [1:-1,1:-1] = signal.convolve2d(gr,kernel,mode ='same')[1:-1,: - 2]
验证输出和运行时测试
定义函数:
def org_app(gr,t):
new_gr = np.zeros(( (1,height-1):
for x inrange(1,width-1):
new_gr [h] [w] = gr [h] [w-1] + gr [h-1] [w] + t * gr [h + 1] [w-1] -2 *(gr [h] [w -1] + t * gr [h-1] [w])
return new_gr
def suggested_app(gr,t):
kern el = -np.array([[0,1-2 * t,0],[ - 1,1,0,],[t,0,0]])
out = np.zeros(( (1:-1,1:-1)= signal.convolve2d(gr,kernel,mode ='same')[1:-1,: - 2]
退出
确认 -
在[244]中:#输入
...:gr = np.random.rand(40,50)
...:height,width = gr.shape
...:t = 1
...:
In [245]:np.allclose(org_app(gr,t),proposed_app(gr,t))
出[245]:真
定时 -
在[246]中:#输入
...:gr = np.random.rand 400,500)
...:height,width = gr.shape
...:t = 1
...:
在[247]:% timeit org_app(gr,t)
1个循环,每个循环最好是3:2.13 s
在[248]中:%timeit proposed_app(gr,t)
10个循环,最好的3:每循环19.4毫秒
Part of my Python program contains the follow piece of code, where a new grid is calculated based on data found in the old grid.
The grid i a two-dimensional list of floats. The code uses three for-loops:
for t in xrange(0, t, step):
for h in xrange(1, height-1):
for w in xrange(1, width-1):
new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
gr = new_gr
return gr
The code is extremly slow for a large grid and a large time t.
I've tried to use Numpy to speed up this code, by substituting the inner loop with:
J = np.arange(1, width-1)
new_gr[h][J] = gr[h][J] + gr[h][J-1] ...
But the results produced (the floats in the array) are about 10% smaller than their list-calculation counterparts.
What loss of accuracy is to be expected when converting lists of floats to Numpy array of floats using np.array(pylist) and then doing a calculation?
How should I go about converting a triple for-loop to pretty and fast Numpy code? (or are there other suggestions for speeding up the code significantly?)
If gr
is a list of floats, the first step if you are looking to vectorize with NumPy would be to convert gr
to a NumPy array with np.array()
.
Next up, I am assuming that you have new_gr
initialized with zeros of shape (height,width)
. The calculations being performed in the two innermost loops basically represent 2D convolution
. So, you can use signal.convolve2d
with an appropriate kernel
. To decide on the kernel
, we need to look at the scaling factors and make a 3 x 3
kernel out of them and negate them to simulate the calculations we are doing with each iteration. Thus, you would have a vectorized solution with the two innermost loops being removed for better performance, like so -
import numpy as np
from scipy import signal
# Get the scaling factors and negate them to get kernel
kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
# Initialize output array and run 2D convolution and set values into it
out = np.zeros((height,width))
out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
Verify output and runtime tests
Define functions :
def org_app(gr,t):
new_gr = np.zeros((height,width))
for h in xrange(1, height-1):
for w in xrange(1, width-1):
new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
return new_gr
def proposed_app(gr,t):
kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
out = np.zeros((height,width))
out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
return out
Verify -
In [244]: # Inputs
...: gr = np.random.rand(40,50)
...: height,width = gr.shape
...: t = 1
...:
In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t))
Out[245]: True
Timings -
In [246]: # Inputs
...: gr = np.random.rand(400,500)
...: height,width = gr.shape
...: t = 1
...:
In [247]: %timeit org_app(gr,t)
1 loops, best of 3: 2.13 s per loop
In [248]: %timeit proposed_app(gr,t)
10 loops, best of 3: 19.4 ms per loop
这篇关于将嵌套循环计算转换为Numpy以加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!