查找两个矩阵之间的最小余弦距离 [英] Find minimum cosine distance between two matrices

查看:100
本文介绍了查找两个矩阵之间的最小余弦距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个2D np.arrays,我们称它们为AB,它们都具有形状.对于2D数组A中的每个矢量,我需要在矩阵B中找到具有最小余弦距离的矢量.为此,我只有一个double for循环,我试图在其中寻找最小值.所以基本上我会执行以下操作:

I have two 2D np.arrays let's call them A and B, both having the shape. For every vector in 2D array A I need to find the vector in matrix B, that have the minimum cosine distance. To do this I just have a double for loop inside of which I try to find the minimum value. So basically I do the following:

from scipy.spatial.distance import cosine
l, res = A.shape[0], []
for i in xrange(l):
    minimum = min((cosine(A[i], B[j]), j) for j in xrange(l))
    res.append(minimum[1])

在上面的代码中,循环之一隐藏在理解之后.一切正常,但是double for循环使它变得太慢(我试图用double理解来重写它,这使事情有点快,但仍然很慢).

In the code above one of the loop is hidden behind a comprehension. Everything works fine, but the double for loop makes it too slow (I tried to rewrite it with a double comprehension, which made things a little bit faster, but still slow).

我相信有一个numpy函数可以更快地完成以下操作(使用一些线性代数).

I believe that there is a numpy function that can achieve the following faster (using some linear-algebra).

那么有什么方法可以更快地实现我想要的吗?

So is there a way to achieve what I want faster?

推荐答案

来自

From the cosine docs we have the following info -

scipy.spatial.distance.cosine(u,v):计算一维数组之间的余弦距离.

scipy.spatial.distance.cosine(u, v) : Computes the Cosine distance between 1-D arrays.

uv之间的余弦距离定义为

The Cosine distance between u and v, is defined as

其中u⋅vuv的点积.

使用上述公式,我们将使用 NumPy的广播功能,就像这样-

Using the above formula, we would have one vectorized solution using `NumPy's broadcasting capability, like so -

# Get the dot products, L2 norms and thus cosine distances
dots = np.dot(A,B.T)
l2norms = np.sqrt(((A**2).sum(1)[:,None])*((B**2).sum(1)))
cosine_dists = 1 - (dots/l2norms)

# Get min values (if needed) and corresponding indices along the rows for res.
# Take care of zero L2 norm values, by using nanmin and nanargmin  
minval = np.nanmin(cosine_dists,axis=1)
cosine_dists[np.isnan(cosine_dists).all(1),0] = 0
res = np.nanargmin(cosine_dists,axis=1)

运行时测试-

In [81]: def org_app(A,B):
    ...:    l, res, minval = A.shape[0], [], []
    ...:    for i in xrange(l):
    ...:        minimum = min((cosine(A[i], B[j]), j) for j in xrange(l))
    ...:        res.append(minimum[1])
    ...:        minval.append(minimum[0])
    ...:    return res, minval
    ...: 
    ...: def vectorized(A,B):
    ...:     dots = np.dot(A,B.T)
    ...:     l2norms = np.sqrt(((A**2).sum(1)[:,None])*((B**2).sum(1)))
    ...:     cosine_dists = 1 - (dots/l2norms)
    ...:     minval = np.nanmin(cosine_dists,axis=1)
    ...:     cosine_dists[np.isnan(cosine_dists).all(1),0] = 0
    ...:     res = np.nanargmin(cosine_dists,axis=1)
    ...:     return res, minval
    ...: 

In [82]: A = np.random.rand(400,500)
    ...: B = np.random.rand(400,500)
    ...: 

In [83]: %timeit org_app(A,B)
1 loops, best of 3: 10.8 s per loop

In [84]: %timeit vectorized(A,B)
10 loops, best of 3: 145 ms per loop

验证结果-

In [86]: x1, y1 = org_app(A, B)
    ...: x2, y2 = vectorized(A, B)
    ...: 

In [87]: np.allclose(np.asarray(x1),x2)
Out[87]: True

In [88]: np.allclose(np.asarray(y1)[~np.isnan(np.asarray(y1))],y2[~np.isnan(y2)])
Out[88]: True

这篇关于查找两个矩阵之间的最小余弦距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆