向量化循环NumPy [英] Vectorizing for loops NumPy

查看:76
本文介绍了向量化循环NumPy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Python比较陌生,并且有一个嵌套的for循环.由于for循环需要一段时间才能运行,因此我试图找到一种矢量化此代码的方法,以便使其运行得更快.

I'm relatively new to Python and I've got a nested for loop. Since the for loops take a while to run, I'm trying to figure out a way to vectorize this code so it can run faster.

在这种情况下,coord是3维数组,其中coord [x,0,0]和coord [x,0,1]是整数,而coord [x,0,2]是0或1.H是SciPy稀疏矩阵,x_dist,y_dist,z_dist和a均为浮点数.

In this case, coord is a 3-dimensional array where coord[x, 0, 0] and coord[x, 0, 1] are integers and coord[x, 0, 2] is either 0 or 1. H is a SciPy sparse matrix and x_dist, y_dist, z_dist, and a are all floats.

# x_dist, y_dist, and z_dist are floats
# coord is a num x 1 x 3 numpy array where num can go into the hundreds of thousands
num = coord.shape[0]    
H = sparse.lil_matrix((num, num))
for i in xrange(num):
    for j in xrange(num):
        if (np.absolute(coord[i, 0, 0] - coord[j, 0, 0]) <= 2 and
                (np.absolute(coord[i, 0, 1] - coord[j, 0, 1]) <= 1)):

            x = ((coord[i, 0, 0] * x_dist + coord[i, 0, 2] * z_dist) -
                 (coord[j, 0, 0] * x_dist + coord[j, 0, 2] * z_dist))

            y = (coord[i, 0, 1] * y_dist) - (coord[j, 0, 1] * y_dist)

            if a - 0.5 <= np.sqrt(x ** 2 + y ** 2) <= a + 0.5:
                H[i, j] = -2.7

我还读到,使用NumPy进行广播虽然速度更快,但会导致临时数组占用大量内存.走矢量化路线还是尝试使用像Cython这样的东西会更好?

I've also read that broadcasting with NumPy, while much faster, can lead to large amounts of memory usage from temporary arrays. Would it be better to go the vectorization route or try and use something like Cython?

推荐答案

这是我将向量化您的代码的方式,稍后将对警告进行一些讨论:

This is how I would vectorize your code, some discussion on the caveats later:

import numpy as np
import scipy.sparse as sps

idx = ((np.abs(coord[:, 0, 0] - coord[:, 0, 0, None]) <= 2) &
       (np.abs(coord[:, 0, 1] - coord[:, 0, 1, None]) <= 1))

rows, cols = np.nonzero(idx)
x = ((coord[rows, 0, 0]-coord[cols, 0, 0]) * x_dist +
     (coord[rows, 0, 2]-coord[cols, 0, 2]) * z_dist)
y = (coord[rows, 0, 1]-coord[cols, 0, 1]) * y_dist
r2 = x*x + y*y

idx = ((a - 0.5)**2 <= r2) & (r2 <= (a + 0.5)**2)

rows, cols = rows[idx], cols[idx]
data = np.repeat(2.7, len(rows))

H = sps.coo_matrix((data, (rows, cols)), shape=(num, num)).tolil()

正如您所指出的,问题将与第一个idx数组一起出现,因为它的形状为(num, num),因此如果num被分成数百个数组,数千."

As you noted, the issues are going to come with the first idx array, as it will be of shape (num, num), so it will probably blow your memory to pieces if num is "into the hundreds of thousands."

一种潜在的解决方案是将您的问题分解为可管理的部分.如果您有100,000个元素的数组,则可以将其拆分为100个包含1,000个元素的块,并为10,000个块组合中的每一个运行上面代码的修改版本.您只需要一个1,000,000个元素idx数组(可以对其进行预分配和重用以获得更好的性能),并且您将只有10,000次迭代的循环,而不是当前实现的10,000,000,000.这是一个穷人的并行化方案,如果您拥有多核计算机,则可以通过并行处理其中的几个块来实际进行改进.

One potential solution is to break down your problem into manageable chunks. If you have a 100,000 element array, you can split it into 100 chunks of 1,000 elements, and run a modified version of the code above for each of the 10,000 combinations of chunks. You would only need a 1,000,000 element idx array (which you could pre-allocate and reuse for better performance), and you would have a loop of only 10,000 iterations, instead of the 10,000,000,000 of your current implementation. It is sort of a poor man's parallelization scheme, which you can actually improve on by having several of those chunks processed in parallel if you have a multi-core machine.

这篇关于向量化循环NumPy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆