将插入符号包与"knn"一起使用时出错方法-出问题了;所有精度指标值均丢失 [英] Error using Caret Package with "knn" method -- Something is wrong; all the Accuracy metric values are missing

查看:152
本文介绍了将插入符号包与"knn"一起使用时出错方法-出问题了;所有精度指标值均丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用插入符号包,并使用knn算法训练模型,但是遇到了错误.我正在使用德国信用数据,这就是数据框的结构

Hi I am using the caret package and training a model with a knn algorithm but I am running into an error. I am using the german credit data and this is the structure of the data frame

'data.frame':   1000 obs. of  21 variables:
$ checking_balance    : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 
$ months_loan_duration: int  6 48 12 42 24 36 24 36 12 30 ...
$ credit_history      : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 
$ purpose             : Factor w/ 10 levels "business","car (new)",..: 8 8 5 
$ amount              : int  1169 5951 2096 7882 4870 9055 2835 6948 3059 
$ savings_balance     : Factor w/ 5 levels "< 100 DM","> 1000 DM",..: 5 1 
$ employment_length   : Factor w/ 5 levels "> 7 yrs","0 - 1 yrs",..: 1 3 4 
$ installment_rate    : int  4 2 2 2 3 2 3 2 2 4 ...
$ personal_status     : Factor w/ 4 levels "divorced male",..: 4 2 4 4 4
$ other_debtors       : Factor w/ 3 levels "co-applicant",..: 3 3 3 2 3 3 
$ residence_history   : int  4 2 3 4 4 4 4 2 4 2 ...
$ property            : Factor w/ 4 levels "building society savings",..:  
$ age                 : int  67 22 49 45 53 35 53 35 61
$ installment_plan    : Factor w/ 3 levels "bank","none",..: 2 2 2 2 2 2 
$ housing             : Factor w/ 3 levels "for free","own",..: 2 2 1 2 3
$ existing_credits    : int  2 1 1 1 2 1 1 1 ...
$ default             : Factor w/ 2 levels "1","2": 1 2 1 1 2 1  1 2 ...
$ dependents          : int  1 1 2 2 2 2 1  1 ...
$ telephone           : Factor w/ 2 levels "none","yes": 2 1 1 1  2 1 1 .
$ foreign_worker      : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 ...
$ job                 : Factor w/ 4 levels "mangement self-employed",..: 2

目标变量为credit $ default

the target variable is credit$default

运行代码时

cv_opts = trainControl(method="repeatedcv", repeats = 5)
model_knn<-train(trainSet[,predictors],trainSet[,outcomeName],method="knn", trControl=cv_opts)

我收到此错误

Something is wrong; all the Accuracy metric values are missing:
Accuracy       Kappa    
Min.   : NA   Min.   : NA  
1st Qu.: NA   1st Qu.: NA  
Median : NA   Median : NA  
Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :3     NA's   :3    
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

我将相同的代码与其他方法(rpart,ada)一起使用,并且可以正常工作,看来我好像在trControl中缺少knn的某些内容?

I use that same code with other methods, rpart, ada, and it works, it seems I am like I am missing something in the trControl for the knn?

推荐答案

问题在于,当使用插入符号训练功能的默认S3方法时,knn不知道如何处理分类预测变量:

The problem is the fact knn does not know how to handle categorical predictors when using the default S3 method of the caret train function:

示例:

library(mlbench)
library(caret)
data(Servo)
summary(Servo)
 Motor  Screw  Pgain  Vgain      Class      
 A:36   A:42   3:50   1:47   Min.   : 1.00  
 B:36   B:35   4:66   2:49   1st Qu.:10.50  
 C:40   C:31   5:26   3:27   Median :18.00  
 D:22   D:30   6:25   4:22   Mean   :21.17  
 E:33   E:29          5:22   3rd Qu.:33.50  
                             Max.   :51.00  

所以所有的预测变量都是分类的

so all the predictors are categorical

predictors <- colnames(Servo)[1:4]
cv_opts = trainControl(method="repeatedcv", repeats = 5)
model_knn <- train(Servo[predictors],
                   Servo[,5],
                   method = "knn",
                   trControl = cv_opts)

导致:

Something is wrong; all the RMSE metric values are missing:...

要克服这一点,可以使用公式S3的方法进行训练:

to overcome this one can use the formula S3 method for train:

model_knn <- train(Class~.,
                   data = Servo,
                   method = "knn",
                   trControl = cv_opts)

model_knn
k-Nearest Neighbors 

167 samples
  4 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 151, 149, 149, 150, 151, 151, ... 
Resampling results across tuning parameters:

  k  RMSE      Rsquared   MAE     
  5  9.124929  0.6404554  7.820686
  7  9.356812  0.6393563  7.983302
  9  9.775620  0.6169618  8.396535

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.

或者您可以构建自己的模型矩阵,并在默认的S3方法中使用它:

Or you can build your own model matrix and use it in the default S3 method:

Servo_X <- 
  model.matrix(Class~.-1,
               data = Servo) 

model_knn2 <- train(Servo_X,
                   Servo$Class,
                   method = "knn",
                   trControl = cv_opts)

k-Nearest Neighbors 

167 samples
 16 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 149, 151, 151, 150, 151, 151, ... 
Resampling results across tuning parameters:

  k  RMSE      Rsquared   MAE     
  5  9.289972  0.6310129  7.869684
  7  9.487649  0.6401052  8.021603
  9  9.908227  0.6479472  8.604000

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was k = 5.

另外,在使用knn时使用preProc = c("center", "scale")是个好主意,因为您希望所有的预测变量都在同一范围内.

Additionally its a good idea to use preProc = c("center", "scale") when using knn since you want all the predictors to be on the same scale.

要了解使用公式界面时发生的情况,请检出:

To understand what is happening when you use the formula interface check out:

https://github.com/topepo/caret/blob/master/models/files/knn.R

以及

caret:::knnreg.formula
caret:::knn3.formula

这篇关于将插入符号包与"knn"一起使用时出错方法-出问题了;所有精度指标值均丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆