在带有分类值的R中使用k-NN [英] using k-NN in R with categorical values

查看：91 发布时间：2020/4/26 11:02:41 r distance knn

本文介绍了在带有分类值的R中使用k-NN的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找对大多数具有分类特征的数据进行分类的方法.为此，欧几里得距离(或任何其他假定距离的数字)不适合.

I'm looking to perform classification on data with mostly categorical features. For that purpose, Euclidean distance (or any other numerical assuming distance) doesn't fit.

我正在寻找[R]的kNN实现，可以在其中选择不同的距离方法，例如汉明距离. 有没有办法使用常见的kNN实现(例如{class}中的实现)和不同的距离度量功能?

I'm looking for a kNN implementation for [R] where it is possible to select different distance methods, like Hamming distance. Is there a way to use common kNN implementations like the one in {class} with different distance metric functions?

我正在使用R 2.15

I'm using R 2.15

推荐答案

只要您可以计算出距离/差异矩阵(以您喜欢的任何方式)，就可以轻松地执行kNN分类，而无需任何特殊程序包. /p>

As long as you can calculate a distance/dissimilarity matrix (in whatever way you like) you can easily perform kNN classification without the need of any special package.

# Generate dummy data
y <- rep(1:2, each=50)                          # True class memberships
x <- y %*% t(rep(1, 20)) + rnorm(100*20) < 1.5  # Dataset with 20 variables
design.set <- sample(length(y), 50)
test.set <- setdiff(1:100, design.set)

# Calculate distance and nearest neighbors
library(e1071)
d <- hamming.distance(x)
NN <- apply(d[test.set, design.set], 1, order)

# Predict class membership of the test set
k <- 5
pred <- apply(NN[, 1:k, drop=FALSE], 1, function(nn){
    tab <- table(y[design.set][nn])
    as.integer(names(tab)[which.max(tab)])      # This is a pretty dirty line
}

# Inspect the results
table(pred, y[test.set])

如果有人比上面的脏线知道更好的方法来找到向量中最常见的值，我将很高兴知道.

If anybody knows a better way of finding the most common value in a vector than the dirty line above, I'd be happy to know.

在k=1情况下，需要drop=FALSE自变量将NN的子集保留为矩阵.如果没有，它将被转换为向量，并且apply会引发错误.

The drop=FALSE argument is needed to preserve the subset of NN as matrix in the case k=1. If not it will be converted to a vector and apply will throw an error.

这篇关于在带有分类值的R中使用k-NN的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在带有分类值的R中使用k-NN [英] using k-NN in R with categorical values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在带有分类值的R中使用k-NN [英] using k-NN in R with categorical values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭