从距离矩阵开始,找到 K 个最近的邻居 [英] Find K nearest neighbors, starting from a distance matrix

查看:28
本文介绍了从距离矩阵开始,找到 K 个最近的邻居的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个优化良好的函数,它接受一个 n X n 距离矩阵并返回一个带有 索引的 n X k 矩阵k 第 i 行中第 i 个数据点的最近邻居.

我发现有无数不同的 R 软件包可以让您进行 KNN,但它们似乎都包含距离计算以及同一函数中的排序算法.特别是,对于大多数例程,主要参数是原始数据矩阵,而不是距离矩阵.就我而言,我在混合变量类型上使用非标准距离,因此我需要将排序问题与距离计算分开.

这并不是一个令人生畏的问题——我显然可以在循环中使用 order 函数来获得我想要的东西(见下面我的解决方案),但这远非最佳.例如,当 k 很小(小于 11)时,带有 partial = 1:ksort 函数运行得更快,但不幸的是只返回排序的值而不是所需的索引.

解决方案

尝试使用 FastKNN CRAN 包(虽然它没有很好的记录).它提供了 k.nearest.neighbors 函数,可以给出任意距离矩阵.下面是一个计算所需矩阵的示例.

#任意数据train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2n = 暗淡(火车)[1]distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n# 邻居矩阵k=3nn = 矩阵(0,n,k) # n x kfor (i in 1:n)nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)

<块引用>

注意:您可以随时查看 Cran 软件包列表中的 Ctrl+F='knn'相关功能:https://cran.r-project.org/web/packages/available_packages_by_name.html

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.

I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.

This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.

解决方案

Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.

# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n

# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
   nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)

Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html

这篇关于从距离矩阵开始,找到 K 个最近的邻居的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆