尝试在R中运行kNN时,我得到了coercionNAs引入的错误NAs? [英] I get the error NAs introduced by coercionNAs when trying to run kNN in R?

查看:140
本文介绍了尝试在R中运行kNN时,我得到了coercionNAs引入的错误NAs?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据集上运行kNN,但我不断收到一些NA错误.我已经用尽了堆栈溢出,试图找到此问题的解决方案.我在任何地方都找不到有用的东西.

I am trying to run kNN on a dataset but I keep getting some NA error. I have exhausted stack overflow trying to find a solution to this problem. I could not find anything useful anywhere.

这是我正在使用的数据集: https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles

This is the dataset I am working with : https://www.kaggle.com/tsiaras/uk-road-safety-accidents-and-vehicles

我已经将我的预测变量和目标的每个单因子变量和整数变量都转换为数值,以便可以进行欧几里得距离.我删除了所有的NA,但kNN不断抛出以下错误消息:

I have converted every single factor variable and integer variable for my predictor and target to numeric so it can do Euclidean distance. I have removed all the NA's but kNN keeps throwing the following error message :

NA由knn(train [2:nrow(train),c(11,22,23,25,27,28)],test [(2:nrow(test))中的coercionError引入的NA),:外部函数调用中的NA/NaN/Inf(arg 6)

这是我如何转换所有预测变量并运行kNN的一个示例:

This is one example of how I am converting all the predictors and running kNN :

as.numeric(levels(test$Road_Type))[levels(test$Road_Type)]
as.numeric(levels(train$Road_Type))[levels(train$Road_Type)]

train <- na.exclude(train)
test <- na.exclude(test) 

cl=as.numeric(train[2:nrow(train),5])
cl <- na.exclude(cl)
knn0 <- knn(train[2:nrow(train),c(11,22,23,25,27,28)], test[(2:nrow(test)),c(11,22,23,25,27,28)], cl)

我正在为所有11、22、23、25、27、28列以及目标做as.numeric的东西.我从2开始行,所以它不包含标签.在将参数传递到kNN函数之前,我还尝试运行以下代码:

I am doing the as.numeric stuff for all the columns 11,22,23,25,27,28 and also the target. I am starting the row at 2 so it doesn't include the labels. I have also tried running the following code before passing the parameters into the kNN function :

sum(is.na(train[2:nrow(train),c(11,22,23,25,27,28)]))
sum(is.na(test[2:nrow(test),c(11,22,23,25,27,28)]))
sum(is.na(cl))

所有这3个都返回0,因此在将其传递给kNN函数之前没有NA值.

All 3 of these return 0 so there are no NA values before I am passing it into the kNN function.

编辑

通过将其转换为数字来解决此问题:

Fixed the issue by converting to numeric like this :

train $ Road_Type<-as.numeric(as.integer(factor(train $ Road_Type)))

train$Road_Type <- as.numeric(as.integer(factor(train$Road_Type)))

感谢所有提供帮助的人!

Thanks to everyone who helped!

推荐答案

您需要始终查看数据.这可以帮助您和其他人回答问题.

You need to always look into the data. This helps you and others to answer the question.

如果我们检查您的数据,则它看起来像这样:

If we check your data it looks like this:

str(df[, c(11, 22, 23, 25, 27, 28)])
'data.frame':   2047256 obs. of  6 variables:
 $ Junction_Control                 : chr  "Data missing or out of range" "Auto traffic signal" "Data missing or out of range" "Data missing or out of range" ...
 $ Number_of_Vehicles               : int  1 1 2 1 1 2 2 1 2 2 ...
 $ Pedestrian_Crossing.Human_Control: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Police_Force                     : chr  "Metropolitan Police" "Metropolitan Police" "Metropolitan Police" "Metropolitan Police" ...
 $ Road_Type                        : chr  "Single carriageway" "Dual carriageway" "Single carriageway" "Single carriageway" ...
 $ Special_Conditions_at_Site       : chr  "None" "None" "None" "None" ...

如果将字符转换为数字会发生什么情况

What happens if we transform a character to numeric:

df$Police_Force <- as.numeric(df$Police_Forc)

df$Police_Force
[1] NA NA NA NA NA NA NA ....
Warning message:
  NAs introduced by coercion

这在R中不起作用.但是,如果我们将它们设置为因子,然后再将其更改为数值,则可以解决问题.

This does not work in R. However if we set them as factors and afterward change them to numeric the problem is solved.

df$Police_Force <- as.numeric(as.factor(df$Police_Forc))

df$Police_Force
[1] 30 30 30 30 30 30 30 ...

您的方法行不通,因为变量不是因素而是字符.

Your approach does not work because the variables are not factors but characters.

levels(df$Road_Type)
NULL

as.numeric(levels(df$Road_Type))[levels(df$Road_Type)]
numeric(0)

由于您尚未显示导入R后数据的外观,所以我可能是错误的.我使用了 read.csv 函数.

As you have not shown how your data looks after imported into R I might be wrong. I used the read.csv function.

这篇关于尝试在R中运行kNN时,我得到了coercionNAs引入的错误NAs?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆