mlpack与余弦距离最近的邻居? [英] mlpack nearest neighbor with cosine distance?
问题描述
我想在mlpack中使用NeighborSearch类对表示文档的某些矢量执行KNN分类.
I'd like to use the NeighborSearch class in mlpack to perform KNN classification on some vectors representing documents.
我想使用余弦距离,但是遇到了麻烦.我认为做到这一点的方法是使用内积指标"IPMetric"并指定CosineDistance内核...这就是我所拥有的:
I'd like to use Cosine Distance, but I'm having trouble. I think the way to do this is to use the inner-product metric "IPMetric" and specify the CosineDistance kernel... This is what I have:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>> nn(X_train);
但是出现以下编译错误:
But I get the following compile errors:
/usr/include/mlpack/core/tree/hrectbound_impl.hpp:211:15: error: ‘Power’ is not a member of ‘mlpack::metric::IPMetric<mlpack::kernel::CosineDistance>’
sum += pow((lower + fabs(lower)) + (higher + fabs(higher)),
^
/usr/include/mlpack/core/tree/hrectbound_impl.hpp:220:3: error: ‘TakeRoot’ is not a member of ‘mlpack::metric::IPMetric<mlpack::kernel::CosineDistance>’
if (MetricType::TakeRoot)
^
我怀疑问题可能出在默认的树类型KDTree不支持该距离度量?如果是这样的话,那么是否存在一种适用于CosineDistance的树类型?
I suspect that the problem may be that the default tree type, KDTree, does not support this distance metric? If that's the issue, is there a tree type that does work for CosineDistance?
最后,可以使用蛮力搜索吗?我似乎根本找不到一种不使用任何树的方法...
Finally, is it possible to use a brute-force search? I can't seem to find a way to use no tree at all...
谢谢!
推荐答案
不幸的是,就像您怀疑的那样,任意度量标准类型不适用于KDTree-这是因为kd-tree需要一段可以分解的距离分成不同的尺寸.但这对于IPMetric
是不可能的.相反,为什么不尝试使用覆盖树呢?该树的构建时间可能会更长一些,但它应该具有可比的性能:
Unfortunately, like you suspected, arbitrary metric types don't work with the KDTree---this is because the kd-tree requires a distance that can be decomposed into different dimensions. But that is not possible with IPMetric
. Instead, why not try using the cover tree? The build time of the tree may be somewhat longer but it should give comparable performance:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>, arma::mat,
tree::StandardCoverTree> nn(X_train);
如果要进行暴力搜索,请在构造函数中指定搜索模式:
If you want to do brute-force search, specify the search mode in the constructor:
NeighborSearch<NearestNeighborSort, IPMetric<CosineDistance>, arma::mat,
tree::StandardCoverTree> nn(X_train, NAIVE_MODE);
我希望这会有所帮助;让我知道我是否可以澄清.
I hope this is helpful; let me know if I can clarify anything.
这篇关于mlpack与余弦距离最近的邻居?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!