在python中为数组的每个元素快速计算特征向量 [英] Quickly compute eigenvectors for each element of an array in python

查看:156
本文介绍了在python中为数组的每个元素快速计算特征向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为数据数组(在我的实际情况下,我是多边形云)计算特征向量

I want to compute eigenvectors for an array of data (in my actual case, i cloud of polygons)

为此,我编写了此函数:

To do so i wrote this function:

import numpy as np

def eigen(data):
    eigenvectors = []
    eigenvalues  = []

    for d in data:
        # compute covariance for each triangle
        cov = np.cov(d, ddof=0, rowvar=False)

        # compute eigen vectors   
        vals, vecs = np.linalg.eig(cov)
        eigenvalues.append(vals)
        eigenvectors.append(vecs)

    return np.array(eigenvalues), np.array(eigenvectors)

在一些测试数据上运行它:

Running this on some test data:

import cProfile
triangles = np.random.random((10**4,3,3,)) # 10k 3D triangles
cProfile.run('eigen(triangles)') # 550005 function calls in 0.933 seconds

工作正常,但是由于迭代循环,它变得非常慢.有没有更快的方法来计算我需要的数据而无需遍历数组?如果没有,谁能建议加快速度的方法?

Works fine but it gets very slow because of the iteration loop. Is there a faster way to compute the data I need without iterating over the array? And if not can anyone suggest ways to speed it up?

推荐答案

记住!

好吧,我入侵了 covariance func definition 并输入指定的输入状态:ddof=0, rowvar=False事实证明,所有内容都减少到只有三行-

Well I hacked into covariance func definition and put in the stated input states : ddof=0, rowvar=False and as it turns out, everything reduces to just three lines -

nC = m.shape[1]  # m is the 2D input array
X = m - m.mean(0)
out = np.dot(X.T, X)/nC

为了将其扩展到我们的3D数组情况,我写下了循环版本,对3D输入数组的2D数组部分重复了这三行,就像这样-

To extend it to our 3D array case, I wrote down the loopy version with these three lines being iterated for the 2D arrays sections from the 3D input array, like so -

for i,d in enumerate(m):

    # Using np.cov :
    org_cov = np.cov(d, ddof=0, rowvar=False)

    # Using earlier 2D array hacked version :
    nC = m[i].shape[0]
    X = m[i] - m[i].mean(0,keepdims=True)
    hacked_cov = np.dot(X.T, X)/nC

增强功能

我们需要在那加速最后三行.可以使用broadcasting-

We are needed to speedup the last three lines there. Computation of X across all iterations could be done with broadcasting -

diffs = data - data.mean(1,keepdims=True)

接下来,可以使用transposenp.dot完成所有迭代的点积计算,但是transpose对于这样的多维数组可能是昂贵的事情. np.einsum中存在更好的替代方法,例如-

Next up, the dot-product calculation for all iterations could be done with transpose and np.dot, but that transpose could be a costly affair for such a multi-dimensional array. A better alternative exists in np.einsum, like so -

cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]

使用它!

总结:

for d in data:
    # compute covariance for each triangle
    cov = np.cov(d, ddof=0, rowvar=False)

可以像这样预先计算:

diffs = data - data.mean(1,keepdims=True)
cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]

这些预先计算出的值可以在迭代中用于计算特征向量,就像这样-

These pre-computed values could be used across iterations to compute eigen vectors like so -

for i,d in enumerate(data):
    # Directly use pre-computed covariances for each triangle
    vals, vecs = np.linalg.eig(cov3D[i])

测试!

以下是一些运行时测试,用于评估预计算协方差结果的效果-

Here are some runtime tests to assess the effect of pre-computing covariance results -

In [148]: def original_app(data):
     ...:     cov = np.empty(data.shape)
     ...:     for i,d in enumerate(data):    
     ...:         # compute covariance for each triangle
     ...:         cov[i] = np.cov(d, ddof=0, rowvar=False)
     ...:     return cov
     ...: 
     ...: def vectorized_app(data):            
     ...:     diffs = data - data.mean(1,keepdims=True)
     ...:     return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
     ...: 

In [149]: data = np.random.randint(0,10,(1000,3,3))

In [150]: np.allclose(original_app(data),vectorized_app(data))
Out[150]: True

In [151]: %timeit original_app(data)
10 loops, best of 3: 64.4 ms per loop

In [152]: %timeit vectorized_app(data)
1000 loops, best of 3: 1.14 ms per loop

In [153]: data = np.random.randint(0,10,(5000,3,3))

In [154]: np.allclose(original_app(data),vectorized_app(data))
Out[154]: True

In [155]: %timeit original_app(data)
1 loops, best of 3: 324 ms per loop

In [156]: %timeit vectorized_app(data)
100 loops, best of 3: 5.67 ms per loop

这篇关于在python中为数组的每个元素快速计算特征向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆