提高Python中for循环的性能(可能使用numpy或numba) [英] Improve performance of a for loop in Python (possibly with numpy or numba)
问题描述
我想提高此功能中for
循环的性能.
import numpy as np
import random
def play_game(row, n=1000000):
"""Play the game! This game is a kind of random walk.
Arguments:
row (int[]): row index to use in the p matrix for each step in the
walk. Then length of this array is the same as n.
n (int): number of steps in the random walk
"""
p = np.array([[ 0.499, 0.499, 0.499],
[ 0.099, 0.749, 0.749]])
X0 = 100
Y0 = X0 % 3
X = np.zeros(n)
tempX = X0
Y = Y0
for j in range(n):
tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
Y = tempX % 3
return np.r_[X0, X]
困难在于,每个步骤都基于X
的值计算Y
的值,然后在下一步中使用Y
进行更新X
的值.
我想知道是否有一些小技巧可以起到很大的作用.使用Numba是公平的游戏(我尝试过但没有太大的成功).但是,我不想使用Cython.
快速观察发现,函数代码中的迭代之间存在数据依赖性.现在,存在不同类型的数据依赖项.您正在查看的数据依赖性类型是索引依赖性,即任何迭代中的数据选择都取决于先前的迭代计算.这种依赖性似乎很难在两次迭代之间进行跟踪,因此本文并不是真正的矢量化解决方案.相反,我们将尝试尽可能多地预先计算将在循环中使用的值.基本思想是在循环内做最少的工作.
这里简要说明了如何进行预先计算,从而提供了更有效的解决方案:
-
鉴于
p
的形状相对较小,可以根据输入row
从中提取行元素,因此可以使用p[row]
从p
中预选择所有这些行. /p> -
对于每次迭代,您都在计算一个随机数.您可以将其替换为可以在循环之前设置的随机数组,因此,您也已经预先计算了这些随机值.
-
基于到目前为止的预先计算的值,您将拥有
p
中所有行的列索引.请注意,这些列索引将是包含所有可能的列索引的大ndarray,并且在我们的代码内,将仅基于每次迭代计算选择一个.使用每次迭代列索引,您可以递增或递减X0
以获得每次迭代输出.
实现看起来像这样-
randarr = np.random.rand(n)
p = np.array([[ 0.499, 0.419, 0.639],
[ 0.099, 0.749, 0.319]])
def play_game_partvect(row,n,randarr,p):
X0 = 100
Y0 = X0 % 3
signvals = 2*(randarr[:,None] < p[row]) - 1
col_idx = (signvals + np.arange(3)) % 3
Y = Y0
currval = X0
out = np.empty(n+1)
out[0] = X0
for j in range(n):
currval = currval + signvals[j,Y]
out[j+1] = currval
Y = col_idx[j,Y]
return out
要针对原始代码进行验证,您需要像这样修改原始代码-
def play_game(row,n,randarr,p):
X0 = 100
Y0 = X0 % 3
X = np.zeros(n)
tempX = X0
Y = Y0
for j in range(n):
tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
Y = tempX % 3
return np.r_[X0, X]
请注意,由于此代码会预先计算这些随机值,因此已经可以大大提高问题代码的速度.
运行时测试和输出验证-
In [2]: # Inputs
...: n = 1000
...: row = np.random.randint(0,2,(n))
...: randarr = np.random.rand(n)
...: p = np.array([[ 0.499, 0.419, 0.639],
...: [ 0.099, 0.749, 0.319]])
...:
In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True
In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop
In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop
In [6]: # Inputs
...: n = 10000
...: row = np.random.randint(0,2,(n))
...: randarr = np.random.rand(n)
...: p = np.array([[ 0.499, 0.419, 0.639],
...: [ 0.099, 0.749, 0.319]])
...:
In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True
In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop
In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop
因此,我们看到的加速速度约为 7.5x+
,还不错!
I want to improve the performance of the for
loop in this function.
import numpy as np
import random
def play_game(row, n=1000000):
"""Play the game! This game is a kind of random walk.
Arguments:
row (int[]): row index to use in the p matrix for each step in the
walk. Then length of this array is the same as n.
n (int): number of steps in the random walk
"""
p = np.array([[ 0.499, 0.499, 0.499],
[ 0.099, 0.749, 0.749]])
X0 = 100
Y0 = X0 % 3
X = np.zeros(n)
tempX = X0
Y = Y0
for j in range(n):
tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
Y = tempX % 3
return np.r_[X0, X]
The difficulty lies in the fact that the value of Y
is computed at each step based on the value of X
and that Y
is then used in the next step to update the value for X
.
I wonder if there is some numpy trick that could make a big difference. Using Numba is fair game (I tried it but without much success). However, I do not want to use Cython.
A quick oberservation tells us that there is data dependency between iterations in the function code. Now, there are different kinds of data dependencies. The kind of data dependency you are looking at is indexing dependency that is data selection at any iteration depends on the previous iteration calculations. This dependency seemed difficult to trace between iterations, so this post isn't really a vectorized solution. Rather, we would try to pre-compute values that would be used within the loop, as much as possible. The basic idea is to do minimum work inside the loop.
Here's a brief explanation on how we can proceed with pre-calculations and thus have a more efficient solution :
Given, the relatively small shape of
p
from which row elements are to be extracted based on the inputrow
, you can pre-select all those rows fromp
withp[row]
.For each iteration, you are calculating a random number. You can replace this with a random array that you can setup before the loop and thus, you would have precalculated those random values as well.
Based on the precalculated values thus far, you would have the column indices for all rows in
p
. Note that these column indices would be a large ndarray containing all possible column indices and inside our code, only one would be chosen based on per-iteration calculations. Using the per-iteration column indices, you would increment or decrementX0
to get per-iteration output.
The implementation would look like this -
randarr = np.random.rand(n)
p = np.array([[ 0.499, 0.419, 0.639],
[ 0.099, 0.749, 0.319]])
def play_game_partvect(row,n,randarr,p):
X0 = 100
Y0 = X0 % 3
signvals = 2*(randarr[:,None] < p[row]) - 1
col_idx = (signvals + np.arange(3)) % 3
Y = Y0
currval = X0
out = np.empty(n+1)
out[0] = X0
for j in range(n):
currval = currval + signvals[j,Y]
out[j+1] = currval
Y = col_idx[j,Y]
return out
For verification against the original code, you would have the original code modified like so -
def play_game(row,n,randarr,p):
X0 = 100
Y0 = X0 % 3
X = np.zeros(n)
tempX = X0
Y = Y0
for j in range(n):
tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
Y = tempX % 3
return np.r_[X0, X]
Please note that since this code precomputes those random values, so this already would give you a good speedup over the code in the question.
Runtime tests and output verification -
In [2]: # Inputs
...: n = 1000
...: row = np.random.randint(0,2,(n))
...: randarr = np.random.rand(n)
...: p = np.array([[ 0.499, 0.419, 0.639],
...: [ 0.099, 0.749, 0.319]])
...:
In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True
In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop
In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop
In [6]: # Inputs
...: n = 10000
...: row = np.random.randint(0,2,(n))
...: randarr = np.random.rand(n)
...: p = np.array([[ 0.499, 0.419, 0.639],
...: [ 0.099, 0.749, 0.319]])
...:
In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True
In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop
In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop
Thus, we are seeing a speedup of about 7.5x+
, not bad!
这篇关于提高Python中for循环的性能(可能使用numpy或numba)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!