双python for循环的numpy矢量化 [英] numpy vectorization of double python for loop

查看:354
本文介绍了双python for循环的numpy矢量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

V是(n,p)个numpy数组,通常尺寸为n〜10,p〜20000

V is (n,p) numpy array typically dimensions are n~10, p~20000

我现在的代码如下

A = np.zeros(p)
for i in xrange(n):
    for j in xrange(i+1):
        A += F[i,j] * V[i,:] * V[j,:]

我将如何重写它以避免双Python for循环?

How would I go about rewriting this to avoid the double python for loop?

推荐答案

虽然Isaac的回答似乎很有希望,因为它删除了这两个嵌套的for循环,所以您必须创建一个中间数组M,它是n的倍数.原始V数组的大小. Python for循环并不便宜,但是内存访问也不免费:

While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:

n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)

def op_code(V, F):
    n, p = V.shape
    A = np.zeros(p)
    for i in xrange(n):
        for j in xrange(i+1):
            A += F[i,j] * V[i,:] * V[j,:]
    return A

def isaac_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
    return M.sum((0, 1))

如果您现在都参加这两个考试:

If you now take both for a test ride:

In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True

In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop

In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop

因此,删除for循环会花费8倍的减速.并不是一件好事...此时,您甚至可能要考虑是否需要花费约3ms的时间来评估一个函数是否需要进一步优化.如果您这样做,可以使用np.einsum进行一些小的改进:

So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:

def einsum_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    return np.einsum('ij,ik,jk->k', F, V, V)

现在:

In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True

In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop

因此,引入的代码可能不如for循环易读,因此速度大约提高了20%.我会说不值得...

So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...

这篇关于双python for循环的numpy矢量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆