从距离矩阵开始查找K个最近的邻居 [英] Find K nearest neighbors, starting from a distance matrix
问题描述
我正在寻找一个经过优化的函数,该函数接受n X n
距离矩阵并返回n X k
矩阵,该矩阵的索引是第i行中第i个数据点的最近邻居的索引.
I'm looking for a well-optimized function that accepts an n X n
distance matrix and returns an n X k
matrix with the indices of the k
nearest neighbors of the ith datapoint in the ith row.
我找到了种类繁多的R
软件包,它们可以让您执行KNN,但是它们似乎都包含距离计算以及同一函数中的排序算法.特别是,对于大多数例程,主要参数是原始数据矩阵,而不是距离矩阵.就我而言,我在混合变量类型上使用了非标准距离,因此我需要从距离计算中分离出排序问题.
I find a gazillion different R
packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.
这并不是一个令人生畏的问题-我显然可以在循环中使用order
函数来获取我想要的东西(请参阅下面的解决方案),但这远非最佳选择.例如,当k
较小(小于11)时,带有partial = 1:k
的sort
函数的运行速度要快得多,但不幸的是,它仅返回排序后的值,而不返回所需的索引.
This is not exactly a daunting problem -- I obviously could just use the order
function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort
function with partial = 1:k
when k
is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.
推荐答案
Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors
function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.
# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n
# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)
注意:您始终可以检查Cran软件包列表中的Ctrl + F ='knn' 相关功能: https://cran.r-project.org/web/packages/available_packages_by_name.html
Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html
这篇关于从距离矩阵开始查找K个最近的邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!