获取二维numpy ndarray或numpy矩阵中前N个值的索引 [英] Get indices of top N values in 2D numpy ndarray or numpy matrix

查看:816
本文介绍了获取二维numpy ndarray或numpy矩阵中前N个值的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有N维向量的数组.

data = np.array([[5, 6, 1], [2, 0, 8], [4, 9, 3]])

In [1]: data
Out[1]:
array([[5, 6, 1],
       [2, 0, 8],
       [4, 9, 3]])

我正在使用sklearn的 pairwise_distances函数计算距离值矩阵.请注意,该矩阵关于对角线对称.

I'm using sklearn's pairwise_distances function to compute a matrix of distance values. Note that this matrix is symmetric about the diagonal.

dists = pairwise_distances(data)

In [2]: dists
Out[2]:
array([[  0.        ,   9.69535971,   3.74165739],
       [  9.69535971,   0.        ,  10.48808848],
       [  3.74165739,  10.48808848,   0.        ]])

我需要与该矩阵dists中的前N个值相对应的索引,因为这些索引将与data中的成对索引相对应,它们表示向量之间的距离最大.

I need the indices corresponding to the top N values in this matrix dists, because these indices will correspond the pairwise indices in data that represent vectors with the greatest distances between them.

我尝试执行np.argmax(np.max(distances, axis=1))来获取每一行中最大值的索引,并且np.argmax(np.max(distances, axis=0))来获取每一行中最大值的索引,但是请注意:

I have tried doing np.argmax(np.max(distances, axis=1)) to get the index of the max value in each row, and np.argmax(np.max(distances, axis=0)) to get the index of the max value in each column, but note that:

In [3]: np.argmax(np.max(dists, axis=1))
Out[3]: 1

In [4]: np.argmax(np.max(dists, axis=0))
Out[4]: 1

和:

In [5]: dists[1, 1]
Out[5]: 0.0

因为矩阵是关于对角线对称的,并且因为argmax返回它找到的具有最大值的第一个索引,所以我最终将行和列匹配的对角线中的单元格存储了最大值,而不是顶部值本身的行和列.

Because the matrix is symmetric about the diagonal, and because argmax returns the first index it finds with the max value, I end up with the cell in the diagonal in the row and column matching where the max values are stored, instead of the row and column of the top values themselves.

在这一点上,我确定我可以编写更多代码来查找所需的值,但是肯定有一种更简单的方法可以执行我要执行的操作.因此,我有两个大致相同的问题:

At this point I'm sure I could write some more code to find the values I'm looking for, but surely there is an easier way to do what I'm trying to do. So I have two questions that are more or less equivalent:

如何找到与矩阵中前N个值相对应的索引如何找到前N个成对距离的向量来自向量数组?

推荐答案

我先弄乱,argsort,然后解散.我并不是说这是最好的方法,只是这是我想到的第一种方法,在有人发表更明显的内容后,我可能会羞愧地删除它. :-)

I'd ravel, argsort, and then unravel. I'm not claiming this is the best way, only that it's the first way that occurred to me, and I'll probably delete it in shame after someone posts something more obvious. :-)

说(随意选择前2个值):

That said (choosing the top 2 values, arbitrarily):

In [73]: dists = sklearn.metrics.pairwise_distances(data)

In [74]: dists[np.tril_indices_from(dists, -1)] = 0

In [75]: dists
Out[75]: 
array([[  0.        ,   9.69535971,   3.74165739],
       [  0.        ,   0.        ,  10.48808848],
       [  0.        ,   0.        ,   0.        ]])

In [76]: ii = np.unravel_index(np.argsort(dists.ravel())[-2:], dists.shape)

In [77]: ii
Out[77]: (array([0, 1]), array([1, 2]))

In [78]: dists[ii]
Out[78]: array([  9.69535971,  10.48808848])

这篇关于获取二维numpy ndarray或numpy矩阵中前N个值的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆