带有预处理功能的R Caret包装错误输入数据 [英] R Caret Package error imputing data with Pre-Process function

查看:70
本文介绍了带有预处理功能的R Caret包装错误输入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个缺少数据的数据集(培训-测试),我想在分类之前估算数据.

I have a dataset (training - testing) with missing data and I would like to impute data before the classification.

我尝试使用插入符号包和preProcess函数,我想使用针对训练集的预测变量来插补数据,并且仅使用训练集的知识而不使用测试集的预测符来插补测试集上的数据(我不知道).

I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know).

p = preProcess(x = training, method = "knnImpute", k = 10)
pred = predict(object = p, newdata = training)
pred1 = predict(object = p, newdata = testing)

运行此代码时,第二行出现此错误

when I run this code, I have this error on the second line

Error in FUN(newX[, i], ...) : 
  cannot impute when all predictors are missing in the new data point

我也尝试删除训练集中的预测变量,但结果是相同的.我尝试使用Iris数据集,在每一列中删除了一些值并删除了预测变量,但它确实起作用了……但是这些数据集具有相同的特征,即data.frame和仅具有数值.

I also tried to remove the predictor variable in the training set but the result is the same. I tried using the Iris dataset, removing some value in each column and removing the predictor and it works...but the datasets are with the same characteristics, both data.frame and both only with numeric values.

推荐答案

用你的话说(不使用测试集的预测变量(我不应该知道)"),我得出的结论是,预测变量"是指目标变量-本身就是一个错误.预测变量"是已知的功能,我们希望从中预测目标变量...

From your words ("without using the predictor of the testing set (that I should not know)"), I conclude that by "predictor" you mean the target variable - which is by itself a mistake. "Predictors" are the known features, from which we wish to predict the target variable...

如果我是对的,您实际上是在尝试使用缺失值插补来预测目标变量,这又是一个错误,而不是缺失值插补的目的.正确的用法是当您的预测变量(特征)缺少一些(但不是全部)值,并且您希望对其进行插补以便用作某些ML算法的输入时,不能容忍缺少的值.

If I am correct, you are actually trying to predict the target variable using missing values imputation, which is again a mistake, and not the purpose of missing value imputation. The correct use is when you have some (but not all) values missing from your predictors (features), and you want to imputate them in order, say, to be used as input to some ML algorithm which does not tolerate missing values.

这篇关于带有预处理功能的R Caret包装错误输入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆