提高Python中for循环的性能(可能使用numpy或numba) [英] Improve performance of a for loop in Python (possibly with numpy or numba)

查看:649
本文介绍了提高Python中for循环的性能(可能使用numpy或numba)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提高此功能中for循环的性能.

import numpy as np
import random

def play_game(row, n=1000000):
    """Play the game! This game is a kind of random walk.

    Arguments:
        row (int[]): row index to use in the p matrix for each step in the
                     walk. Then length of this array is the same as n.

        n (int): number of steps in the random walk
    """
    p = np.array([[ 0.499,  0.499,  0.499],
                  [ 0.099,  0.749,  0.749]])
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0

    for j in range(n):
        tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
        Y = tempX % 3

    return np.r_[X0, X]

困难在于,每个步骤都基于X 的值计算Y的值,然后在下一步中使用Y进行更新X的值.

我想知道是否有一些小技巧可以起到很大的作用.使用Numba是公平的游戏(我尝试过但没有太大的成功).但是,我不想使用Cython.

解决方案

快速观察发现,函数代码中的迭代之间存在数据依赖性.现在,存在不同类型的数据依赖项.您正在查看的数据依赖性类型是索引依赖性,即任何迭代中的数据选择都取决于先前的迭代计算.这种依赖性似乎很难在两次迭代之间进行跟踪,因此本文并不是真正的矢量化解决方案.相反,我们将尝试尽可能多地预先计算将在循环中使用的值.基本思想是在循环内做最少的工作.

这里简要说明了如何进行预先计算,从而提供了更有效的解决方案:

  • 鉴于p的形状相对较小,可以根据输入row从中提取行元素,因此可以使用p[row]p中预选择所有这些行. /p>

  • 对于每次迭代,您都在计算一个随机数.您可以将其替换为可以在循环之前设置的随机数组,因此,您也已经预先计算了这些随机值.

  • 基于到目前为止的预先计算的值,您将拥有p中所有行的列索引.请注意,这些列索引将是包含所有可能的列索引的大ndarray,并且在我们的代码内,将仅基于每次迭代计算选择一个.使用每次迭代列索引,您可以递增或递减X0以获得每次迭代输出.

实现看起来像这样-

randarr = np.random.rand(n)
p = np.array([[ 0.499,  0.419,  0.639],
              [ 0.099,  0.749,  0.319]])

def play_game_partvect(row,n,randarr,p):

    X0 = 100
    Y0 = X0 % 3

    signvals = 2*(randarr[:,None] < p[row]) - 1
    col_idx = (signvals + np.arange(3)) % 3

    Y = Y0
    currval = X0
    out = np.empty(n+1)
    out[0] = X0
    for j in range(n):
        currval = currval + signvals[j,Y]
        out[j+1] = currval
        Y = col_idx[j,Y]

    return out

要针对原始代码进行验证,您需要像这样修改原始代码-

def play_game(row,n,randarr,p):
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0
    for j in range(n):
        tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
        Y = tempX % 3
    return np.r_[X0, X]

请注意,由于此代码会预先计算这些随机值,因此已经可以大大提高问题代码的速度.

运行时测试和输出验证-

In [2]: # Inputs
   ...: n = 1000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True

In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop

In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop

In [6]: # Inputs
   ...: n = 10000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True

In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop

In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop

因此,我们看到的加速速度约为 7.5x+ ,还不错!

I want to improve the performance of the for loop in this function.

import numpy as np
import random

def play_game(row, n=1000000):
    """Play the game! This game is a kind of random walk.

    Arguments:
        row (int[]): row index to use in the p matrix for each step in the
                     walk. Then length of this array is the same as n.

        n (int): number of steps in the random walk
    """
    p = np.array([[ 0.499,  0.499,  0.499],
                  [ 0.099,  0.749,  0.749]])
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0

    for j in range(n):
        tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
        Y = tempX % 3

    return np.r_[X0, X]

The difficulty lies in the fact that the value of Y is computed at each step based on the value of X and that Y is then used in the next step to update the value for X.

I wonder if there is some numpy trick that could make a big difference. Using Numba is fair game (I tried it but without much success). However, I do not want to use Cython.

解决方案

A quick oberservation tells us that there is data dependency between iterations in the function code. Now, there are different kinds of data dependencies. The kind of data dependency you are looking at is indexing dependency that is data selection at any iteration depends on the previous iteration calculations. This dependency seemed difficult to trace between iterations, so this post isn't really a vectorized solution. Rather, we would try to pre-compute values that would be used within the loop, as much as possible. The basic idea is to do minimum work inside the loop.

Here's a brief explanation on how we can proceed with pre-calculations and thus have a more efficient solution :

  • Given, the relatively small shape of p from which row elements are to be extracted based on the input row, you can pre-select all those rows from p with p[row].

  • For each iteration, you are calculating a random number. You can replace this with a random array that you can setup before the loop and thus, you would have precalculated those random values as well.

  • Based on the precalculated values thus far, you would have the column indices for all rows in p. Note that these column indices would be a large ndarray containing all possible column indices and inside our code, only one would be chosen based on per-iteration calculations. Using the per-iteration column indices, you would increment or decrement X0 to get per-iteration output.

The implementation would look like this -

randarr = np.random.rand(n)
p = np.array([[ 0.499,  0.419,  0.639],
              [ 0.099,  0.749,  0.319]])

def play_game_partvect(row,n,randarr,p):

    X0 = 100
    Y0 = X0 % 3

    signvals = 2*(randarr[:,None] < p[row]) - 1
    col_idx = (signvals + np.arange(3)) % 3

    Y = Y0
    currval = X0
    out = np.empty(n+1)
    out[0] = X0
    for j in range(n):
        currval = currval + signvals[j,Y]
        out[j+1] = currval
        Y = col_idx[j,Y]

    return out

For verification against the original code, you would have the original code modified like so -

def play_game(row,n,randarr,p):
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0
    for j in range(n):
        tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
        Y = tempX % 3
    return np.r_[X0, X]

Please note that since this code precomputes those random values, so this already would give you a good speedup over the code in the question.

Runtime tests and output verification -

In [2]: # Inputs
   ...: n = 1000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True

In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop

In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop

In [6]: # Inputs
   ...: n = 10000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True

In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop

In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop

Thus, we are seeing a speedup of about 7.5x+, not bad!

这篇关于提高Python中for循环的性能(可能使用numpy或numba)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆