计算两个点阵列之间成对角度的矩阵 [英] Compute matrix of pairwise angles between two arrays of points

查看:103
本文介绍了计算两个点阵列之间成对角度的矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个点向量, x y ,形状为(n,p)(m,p).例如:

  x = np.array([[0.,-0.16341,0.98656],[-0.05937,-0.25205、0.96589],[0.05937,-0.25205,0.96589],[-0.11608,-0.33488、0.93508],[0.,-0.33416,0.94252]])y = np.array([[0.,-0.36836,0.92968],[-0.12103,-0.54558、0.82928],[0.12103,-0.54558、0.82928]]) 

我想计算一个(n,m)大小的矩阵,其中包含两个点之间的角度,即解决方案

有多种方法可以做到这一点:

 将numpy.linalg导入为la从scipy.spatial导入距离为dist#手动def method0(x,y):dotprod_mat = np.dot(x,y.T)costheta = dotprod_mat/la.norm(x,axis = 1)[:, np.newaxis]costheta/= la.norm(y,axis = 1)返回np.arccos(costheta)#使用einsumdef method1(x,y):dotprod_mat = np.einsum('ij,kj-> ik',x,y)costheta = dotprod_mat/la.norm(x,axis = 1)[:, np.newaxis]costheta/= la.norm(y,axis = 1)返回np.arccos(costheta)#使用scipy.spatial.cdist(单线)def method2(x,y):costheta = 1-dist.cdist(x,y,'cosine')返回np.arccos(costheta)#意识到您的数组x和y已经被规范化,这意味着您可以#进一步优化method1def method3(x,y):costheta = np.einsum('ij,kj-> ik',x,y)#直接给出costheta,因为#|| x ||= || y ||= 1返回np.arccos(costheta) 

(n,m)=(1212,252)的计时结果:

 >>>%timeit theta =方法0(x,y)100次循环,每循环3:11.1毫秒最佳>>>%timeit theta = method1(x,y)100次循环,最好为3:每个循环10.8毫秒>>>%timeit theta = method2(x,y)100个循环,最佳3:每个循环12.3毫秒>>>%timeit theta = method3(x,y)100次循环,最佳3:每个循环9.42毫秒 

时序的差异随着元素数量的增加而减小.对于(n,m)=(6252,1212):

 >>>%timeit -n10 theta = method0(x,y)10个循环,最佳3:每个循环365毫秒>>>%timeit -n10 theta = method1(x,y)10个循环,每个循环最好3:358毫秒>>>%timeit -n10 theta = method2(x,y)10个循环,最佳3:每个循环384毫秒>>>%timeit -n10 theta = method3(x,y)10次​​循环,最佳3:每个循环314毫秒 

但是,如果您省去了 np.arccos 步骤,即假设您只需使用 costheta 进行管理,并且不需要 > theta 本身,然后:

 >>>%timeit costheta = np.einsum('ij,kj-> ik',x,y)10个循环,最佳3:61.3 ms/循环>>>%timeit costheta = 1-dist.cdist(x,y,'cosine')10次​​循环,最佳3:每个循环124毫秒>>>%timeit costheta = dist.cdist(x,y,'cosine')10个循环,每个循环最好3:112毫秒 

这是针对(6252,1212)的情况.因此,实际上 np.arccos 占用了80%的时间.在这种情况下,我发现 np.einsum dist.cdist 快得多.因此,您肯定要使用 einsum .

摘要: theta 的结果基本相似,但是 np.einsum 对我来说最快,尤其是当您不需要多余的计算时规范.尽量避免计算 theta ,而只使用 costheta .

注意:我没有提到的重要一点是浮点精度的有限会导致 np.arccos 给出 nan 价值观.自然, method [0:3] 用于处理未正确归一化的 x y 值.但是 method3 给出了一些 nan .我用预归一化解决了这个问题,这会自然破坏使用 method3 的任何收益,除非您需要为一小套预归一化的矩阵进行多次计算(无论出于何种原因)./p>

I have two vectors of points, x and y, shaped (n, p) and (m, p) respectively. As an example:

x = np.array([[ 0.     , -0.16341,  0.98656],
              [-0.05937, -0.25205,  0.96589],
              [ 0.05937, -0.25205,  0.96589],
              [-0.11608, -0.33488,  0.93508],
              [ 0.     , -0.33416,  0.94252]])
y = np.array([[ 0.     , -0.36836,  0.92968],
              [-0.12103, -0.54558,  0.82928],
              [ 0.12103, -0.54558,  0.82928]])

I want to compute an (n, m)-sized matrix that contains the angles between the two points, a la this question. That is, a vectorized version of:

theta = np.array(
            [ np.arccos(np.dot(i, j) / (la.norm(i) * la.norm(j)))
                 for i in x for j in y ]
        ).reshape((n, m))

Note: n and m can be of the order of ~10000 each.

解决方案

There are multiple ways to do this:

import numpy.linalg as la
from scipy.spatial import distance as dist

# Manually
def method0(x, y):
    dotprod_mat = np.dot(x,  y.T)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using einsum
def method1(x, y):
    dotprod_mat = np.einsum('ij,kj->ik', x, y)
    costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
    costheta /= la.norm(y, axis=1)
    return np.arccos(costheta)

# Using scipy.spatial.cdist (one-liner)
def method2(x, y):
    costheta = 1 - dist.cdist(x, y, 'cosine')
    return np.arccos(costheta)

# Realize that your arrays `x` and `y` are already normalized, meaning you can
# optimize method1 even more
def method3(x, y):
    costheta = np.einsum('ij,kj->ik', x, y) # Directly gives costheta, since
                                            # ||x|| = ||y|| = 1
    return np.arccos(costheta)

Timing results for (n, m) = (1212, 252):

>>> %timeit theta = method0(x, y)
100 loops, best of 3: 11.1 ms per loop
>>> %timeit theta = method1(x, y)
100 loops, best of 3: 10.8 ms per loop
>>> %timeit theta = method2(x, y)
100 loops, best of 3: 12.3 ms per loop
>>> %timeit theta = method3(x, y)
100 loops, best of 3: 9.42 ms per loop

The difference in timing reduces as the number of elements increases. For (n, m) = (6252, 1212):

>>> %timeit -n10 theta = method0(x, y)
10 loops, best of 3: 365 ms per loop
>>> %timeit -n10 theta = method1(x, y)
10 loops, best of 3: 358 ms per loop
>>> %timeit -n10 theta = method2(x, y)
10 loops, best of 3: 384 ms per loop
>>> %timeit -n10 theta = method3(x, y)
10 loops, best of 3: 314 ms per loop

However, if you leave out the np.arccos step, i.e., suppose you could manage with just costheta, and didn't need theta itself, then:

>>> %timeit costheta = np.einsum('ij,kj->ik', x, y)
10 loops, best of 3: 61.3 ms per loop
>>> %timeit costheta = 1 - dist.cdist(x, y, 'cosine')
10 loops, best of 3: 124 ms per loop
>>> %timeit costheta = dist.cdist(x, y, 'cosine')
10 loops, best of 3: 112 ms per loop

This is for the case of (6252, 1212). So actually np.arccos is taking up 80% of the time. In this case I find that np.einsum is much faster than dist.cdist. So you definitely want to be using einsum.

Summary: Results for theta are largely similar, but np.einsum is fastest for me, especially when you're not extraneously computing the norms. Try to avoid computing theta and working with just costheta.

Note: An important point I didn't mention is that finiteness of floating-point precision can cause np.arccos to give nan values. method[0:3] worked for values of x and y that hadn't been properly normalized, naturally. But method3 gave a few nans. I fixed this with pre-normalization, which naturally destroys any gain in using method3, unless you need to do this computation many many times for a small set of pre-normalized matrices (for whatever reason).

这篇关于计算两个点阵列之间成对角度的矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆