为什么scikit-learn的最近邻居似乎没有返回正确的余弦相似度距离? [英] Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

查看：276 发布时间：2020/5/16 23:26:15 python-2.7 scikit-learn nearest-neighbor cosine-similarity

本文介绍了为什么scikit-learn的最近邻居似乎没有返回正确的余弦相似度距离?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用scikit的Nearest Neighbor实现从随机值矩阵中查找与给定列向量最接近的列向量.

I am trying to use scikit's Nearest Neighbor implementation to find the closest column vectors to a given column vector, out of a matrix of random values.

该代码应该找到第21列的最近邻居，然后根据第21列检查这些邻居的实际余弦相似度.

This code is supposed to find the nearest neighbors of column 21 then check the actual cosine similarity of those neighbors against column 21.

from sklearn.neighbors import NearestNeighbors
import sklearn.metrics.pairwise as smp
import numpy as np

test=np.random.randint(0,5,(50,50))
nbrs = NearestNeighbors(n_neighbors=5, algorithm='auto', metric=smp.cosine_similarity).fit(test)
distances, indices = nbrs.kneighbors(test)

x=21   

for idx,d in enumerate(indices[x]):

    sim2 = smp.cosine_similarity(test[:,x],test[:,d])


    print "sklearns cosine similarity would be ", sim2
    print 'sklearns reported distance is', distances[x][idx]
    print 'sklearns if that distance was cosine, the similarity would be: ' ,1- distances[x][idx]

输出类似于

sklearns cosine similarity would be  [[ 0.66190748]]
sklearns reported distance is 0.616586738214
sklearns if that distance was cosine, the similarity would be:  0.383413261786

因此，邻居的输出既不是余弦距离也不是余弦相似度.有什么作用?

So the output of kneighbors is neither the cosine distance or the cosine similarity. What gives?

此外，此外，我认为sklearn的Nearest Neighbors实现不是近似最近邻居"方法，但是与迭代时得到的结果相比，它似乎未检测到数据集中实际的最佳邻居.矩阵，并检查列211与所有其他列的相似性.我在这里误解了一些基本的东西吗?

Also, as an aside, I thought sklearn's Nearest Neighbors implementation was not an Approximate Nearest Neighbors approach, yet it doesn't seem to detect the actual best neighbors in my dataset, compared to the results I get if i iterate over the matrix and check the similarities of column 211 to all the other ones. Am I misunderstanding something basic here?

为什么scikit-learn的最近邻居似乎没有返回正确的余弦相似度距离? [英] Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么scikit-learn的最近邻居似乎没有返回正确的余弦相似度距离? [英] Why does scikit-learn&#39;s Nearest Neighbor doesn&#39;t seem to return proper cosine similarity distances?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么scikit-learn的最近邻居似乎没有返回正确的余弦相似度距离? [英] Why does scikit-learn's Nearest Neighbor doesn't seem to return proper cosine similarity distances?

登录关闭