问:R中的KNN-奇怪的行为 [英] Q: KNN in R — strange behavior

查看:86
本文介绍了问:R中的KNN-奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道为什么下面的KNN R代码对不同的种子给出不同的预测吗?当K <-5时,这是很奇怪的,因此,大多数是很好定义的.另外,浮点数很大-因此不会出现数据精度问题(例如

Does anyone know why the below KNN R code gives different predictions for different seeds? This is strange as K<-5, and thus the majority is well defined. In addition, the floating numbers are large -- so no precision of data problem arises (like in this post).

library(class)

set.seed(642002713)
m = 20
n = 1000
from = -(2^30)
to = -(from)
train = matrix(runif(m*n, from, to), nrow=m, ncol=n)
trainLabels = sample.int(2, size = m, replace=T)-1
test = matrix(runif(n, from, to), nrow=1)

K <- 5

seed <- 544336746
set.seed(seed)
pred_1 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_1, ", seed: ", seed)
#predicted: 0, seed: 544336746

seed <- 621513172 
set.seed(seed)
pred_2 <- knn(train=train, test=test, cl = trainLabels, k=K)
message("predicted: ", pred_2, ", seed: ", seed)
#predicted: 1, seed: 621513172

手动检查:

euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))
result = vector(mode="numeric", length=nrow(train))
for(i in 1:nrow(train)) {
  result[i] <- euc.dist(train[i,], test)
}
a <- data.frame(result, trainLabels)
names(a) = c("RSSE", "labels")
b <- a[with(a, order(sums, decreasing =T)), ]
headK <- head(b, K)
message("Manual predicted K: ", paste(K," class:", names(which.max(table(headK[,2])))))
#Manual predicted K: 5  class: 1

将给出预测1,其RSSE最高为K(= 5):

would give the prediction 1, with the Top K(=5) RSSE:

RSSE             labels
28479706980      1
28472893026      0
28063242772      1
27966740954      1
27927401005      1

因此,大多数人定义明确,而且RSSE中的浮动差异不大.

so, majority is well defined + no problem of small float difference in RSSE.

推荐答案

当我对数据进行缩放和居中-包括测试集!时,我得到的两个预测均为0.

When I scale and center the data - including the test set!, then I get both predictions 0.

我的预处理

sc<-function(x){(x-mean(x))/sd(x)}
train<-apply(train,1,sc)
train<-t(train)
test<-apply(test,1,sc)
test<-t(test)

并获取:

> seed <- 544336746
> pred_1 <- knn(train=train, test=test, cl = trainLabels, k=K)
> message("predicted: ", pred_1, ", seed: ", seed)
predicted: 0, seed: 544336746

> seed <- 621513172
> pred_2 <- knn(train=train, test=test, cl = trainLabels, k=K)
> message("predicted: ", pred_2, ", seed: ", seed)
predicted: 0, seed: 621513172

我对此表格进行了编辑的手动检查

manual check that I edited to this form

a <- data.frame(result, trainLabels)
names(a) = c("RSSE", "labels")
b <- a[with(a, order(a$RSSE)), ]
headK <- head(b, K)
message("Manual predicted K: ", paste(K," class:", names(which.max(table(headK[,2])))))
Manual predicted K: 5  class: 0

和结果:

       RSSE labels
3  43.48199      0
17 43.61283      1
7  43.63948      1
8  43.69730      0
19 43.78931      0    
6  43.88009      0

这篇关于问:R中的KNN-奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆