从距离矩阵开始,找到 K 个最近的邻居 [英] Find K nearest neighbors, starting from a distance matrix
问题描述
我正在寻找一个优化良好的函数,它接受一个 n X n
距离矩阵并返回一个带有 索引的
第 i 行中第 i 个数据点的最近邻居.n X k
矩阵k
我发现有无数不同的 R
软件包可以让您进行 KNN,但它们似乎都包含距离计算以及同一函数中的排序算法.特别是,对于大多数例程,主要参数是原始数据矩阵,而不是距离矩阵.就我而言,我在混合变量类型上使用非标准距离,因此我需要将排序问题与距离计算分开.
这并不是一个令人生畏的问题——我显然可以在循环中使用 order
函数来获得我想要的东西(见下面我的解决方案),但这远非最佳.例如,当 k
很小(小于 11)时,带有 partial = 1:k
的 sort
函数运行得更快,但不幸的是只返回排序的值而不是所需的索引.
尝试使用 FastKNN CRAN 包(虽然它没有很好的记录).它提供了 k.nearest.neighbors
函数,可以给出任意距离矩阵.下面是一个计算所需矩阵的示例.
#任意数据train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2n = 暗淡(火车)[1]distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n# 邻居矩阵k=3nn = 矩阵(0,n,k) # n x kfor (i in 1:n)nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)
<块引用>
注意:您可以随时查看 Cran 软件包列表中的 Ctrl+F='knn'相关功能:https://cran.r-project.org/web/packages/available_packages_by_name.html
I'm looking for a well-optimized function that accepts an n X n
distance matrix and returns an n X k
matrix with the indices of the k
nearest neighbors of the ith datapoint in the ith row.
I find a gazillion different R
packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.
This is not exactly a daunting problem -- I obviously could just use the order
function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort
function with partial = 1:k
when k
is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.
Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors
function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.
# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n
# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)
Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html
这篇关于从距离矩阵开始,找到 K 个最近的邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!