是否可以将KDTree与余弦相似度使用? [英] Is it possible to use KDTree with cosine similarity?

查看:282
本文介绍了是否可以将KDTree与余弦相似度使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我不能对 sklearn KDTree使用这种相似性度量标准,但是我需要这样做,因为我正在使用测量词向量相似度的方法。在这种情况下,什么是快速健壮的自定义算法?我知道本地敏感度哈希 ,但是应该调整&测试很多以找到参数。

Looks like I can't use this similarity metric for with sklearn KDTree, for example, but I need because I am using measuring words vectors similarity. What is fast robust customization algorithm for this case? I know about Local Sensitivity Hashing, but it should tunned & tested up a lot to find params.

推荐答案

通过归一化所有数据点,您将获得的余弦相似度等级与欧氏距离的等级顺序等效第一。因此,您可以将KD树用于与KDTree相邻的k个最近邻居,但是您将需要重新计算余弦相似度。

The ranking your would get with cosine similarity is equivalent to the rank order of the euclidean distance when you normalize all the data points first. So you can use a KD tree to the the k nearest neighbors with KDTrees, but you will need to recompute what the cosine similarity is.

余弦相似度不是通常提供的距离度量,但可以将其转换为一个。如果完成,则可以使用球树等其他结构直接以余弦相似度进行加速nn。如果您对Java实现感兴趣,我已经在 JSAT 库中实现了该功能。

The cosine similarity is not a distance metric as normally presented, but it can be transformed into one. If done, you can then use other structures like Ball Trees to do accelerated nn with cosine similarity directly. I've implemented this in the JSAT library, if you were interested in a Java implementation.

这篇关于是否可以将KDTree与余弦相似度使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆