Python Numpy向量化嵌套的for循环用于组合 [英] Python Numpy vectorize nested for-loops for combinatorics

查看:445
本文介绍了Python Numpy向量化嵌套的for循环用于组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个nxn的实数正数组A,我试图找到2-d数组三行所有组合的元素取最小值中的最大值.使用for循环,结果是这样的:

Given an nxn array A of real positive numbers, I'm trying to find the minimum of the maximum of the element-wise minimum of all combinations of three rows of the 2-d array. Using for-loops, that comes out to something like this:

import numpy as np

n = 100
np.random.seed(2)
A = np.random.rand(n,n)
global_best = np.inf

for i in range(n-2):
    for j in range(i+1, n-1):
        for k in range(j+1, n):
            # find the maximum of the element-wise minimum of the three vectors
            local_best = np.amax(np.array([A[i,:], A[j,:], A[k,:]]).min(0))
            # if local_best is lower than global_best, update global_best
            if (local_best < global_best):
                global_best = local_best
                save_rows = [i, j, k]

print global_best, save_rows

对于n = 100,输出应为:

Out[]: 0.492652949593 [6, 41, 58]

尽管我有一种感觉,我可以使用Numpy向量化更快地完成此操作,并且一定会感谢在此方面的任何帮助.谢谢.

I have a feeling though that I could do this much faster using Numpy vectorization, and would certainly appreciate any help on doing this. Thanks.

推荐答案

不要尝试对不易于向量化的循环进行向量化.而是使用Numba之类的jit编译器或使用Cython.如果生成的代码更具可读性,则矢量化解决方案会很好,但是就性能而言,编译后的解决方案通常更快,或者在最坏的情况下,与矢量化解决方案一样快(BLAS例程除外).

Don't try to vectorize loops that are not simple to vectorize. Instead use a jit compiler like Numba or use Cython. Vectorized solutions are good if the resulting code is more readable, but in terms of performance a compiled solution is usually faster or in a worst case scenario as fast as a vectorized solution (except BLAS routines).

单线程示例

import numba as nb
import numpy as np

#Min and max library calls may be costly for only 3 values
@nb.njit()
def max_min_3(A,B,C):
  max_of_min=-np.inf
  for i in range(A.shape[0]):
    loc_min=A[i]
    if (B[i]<loc_min):
      loc_min=B[i]
    if (C[i]<loc_min):
      loc_min=C[i]

    if (max_of_min<loc_min):
      max_of_min=loc_min

  return max_of_min

@nb.njit()
def your_func(A):
  n=A.shape[0]
  save_rows=np.zeros(3,dtype=np.uint64)
  global_best=np.inf
  for i in range(n):
      for j in range(i+1, n):
          for k in range(j+1, n):
              # find the maximum of the element-wise minimum of the three vectors
              local_best = max_min_3(A[i,:], A[j,:], A[k,:])
              # if local_best is lower than global_best, update global_best
              if (local_best < global_best):
                  global_best = local_best
                  save_rows[0] = i
                  save_rows[1] = j
                  save_rows[2] = k

  return global_best, save_rows

单线程版本的性能

n=100
your_version: 1.56s
compiled_version: 0.0168s (92x speedup)

n=150
your_version: 5.41s
compiled_version: 0.08122s (66x speedup)

n=500
your_version: 283s
compiled_version: 8.86s (31x speedup)

第一个呼叫的固定开销约为0.3-1s.要对计算时间本身进行性能评估,请先调用一次,然后再测量性能.

The first call has a constant overhead of about 0.3-1s. For performance measurement of the calculation time itself, call it once and then measure performance.

只需更改一些代码,该任务就可以并行化.

With a few code changes this task can also be parallelized.

多线程示例

@nb.njit(parallel=True)
def your_func(A):
  n=A.shape[0]
  all_global_best=np.inf
  rows=np.empty((3),dtype=np.uint64)

  save_rows=np.empty((n,3),dtype=np.uint64)
  global_best_Temp=np.empty((n),dtype=A.dtype)
  global_best_Temp[:]=np.inf

  for i in range(n):
      for j in nb.prange(i+1, n):
          row_1=0
          row_2=0
          row_3=0
          global_best=np.inf
          for k in range(j+1, n):
              # find the maximum of the element-wise minimum of the three vectors

              local_best = max_min_3(A[i,:], A[j,:], A[k,:])
              # if local_best is lower than global_best, update global_best
              if (local_best < global_best):
                  global_best = local_best
                  row_1 = i
                  row_2 = j
                  row_3 = k

          save_rows[j,0]=row_1
          save_rows[j,1]=row_2
          save_rows[j,2]=row_3
          global_best_Temp[j]=global_best

      ind=np.argmin(global_best_Temp)
      if (global_best_Temp[ind]<all_global_best):
          rows[0] = save_rows[ind,0]
          rows[1] = save_rows[ind,1]
          rows[2] = save_rows[ind,2]
          all_global_best=global_best_Temp[ind]

  return all_global_best, rows

多线程版本的性能

n=100
your_version: 1.56s
compiled_version: 0.0078s (200x speedup)

n=150
your_version: 5.41s
compiled_version: 0.0282s (191x speedup)

n=500
your_version: 283s
compiled_version: 2.95s (96x speedup)

修改

在更新的Numba版本(通过Anaconda Python发行版安装)中,我必须手动安装 tbb 以获得有效的并行化.

In a newer Numba Version (installed through the Anaconda Python Distribution) I have to manually install tbb to get a working parallelization.

这篇关于Python Numpy向量化嵌套的for循环用于组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆