knn函数出错 [英] Error with knn function

查看:410
本文介绍了knn函数出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试运行此行:

knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5)

但我总是会收到此错误:

but i always get this error :

Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[,  : 
  NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[,  :
  NAs introduced by coercion
2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[,  :
  NAs introduced by coercion

请问有什么想法吗?

PS:mydades.training和mydades.test的定义如下:

PS : mydades.training and mydades.test are defined as follow :

N <- nrow(mydades) 
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]

推荐答案

我怀疑您的问题出在'mydades'中具有非数字数据字段.错误行:

I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:

NA/NaN/Inf in foreign function call (arg 6)

使我怀疑对C语言实现的knn函数调用失败. R中的许多函数实际上调用了更高效的底层C实现,而不是仅在R中实现算法.如果在R控制台中仅键入"knn",则可以检查"knn"的R实现.存在以下行:

makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:

 Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr), 
        as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)), 
        as.double(test), res = integer(nte), pr = double(nte), 
        integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))

其中.C表示我们正在使用提供的函数参数调用名为'VR_knn'的C函数.由于您有两个错误

where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors

NAs introduced by coercion

我认为两个as.double/as.integer调用失败,并引入了NA值.如果我们开始计算参数,则第6个参数是:

I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:

as.double(train)

在以下情况下可能会失败:

that may fail in cases such as:

# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23

您将获得两个强制错误,这可能由'as.double(train)'和'as.double(test)'给出.由于您没有为我们提供有关"mydades"的确切详细信息,因此以下是我的一些最佳猜测(以及人工的多元正态分布数据):

You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):

library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))

# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message

# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"

# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"

# ... or wrong decimal symbol?
mydades[3,3] <- "1,23" 
# should be 1.23, as R uses '.' as decimal symbol and not ','

# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)

我不会将数字数据和类标签都放在一个矩阵中,也许您可​​以将数据拆分为:

I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:

mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]

使用通话

str(mydades); summary(mydades)

还可以帮助您/我们查找有问题的数据条目,并将其更正为数字条目或忽略非数字字段.

may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.

其余的运行代码(在破坏数据之后),由您提供:

The rest of the run code (after breaking the data), as provided by you:

N <- nrow(mydades) 
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]

# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)

这篇关于knn函数出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆