R caret/rfe 变量选择 factor() AND NAs [英] R caret / rfe variable selection for factors() AND NAs

查看:69
本文介绍了R caret/rfe 变量选择 factor() AND NAs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 NAs 的数据集.

I have a data set with NAs sprinkled generously throughout.

此外,它还具有需要 factors() 的列.

In addition it has columns that need to be factors().

我正在使用 caret 包中的 rfe() 函数来选择变量.

I am using the rfe() function from the caret package to select variables.

使用 lmFuncsrfe() 中的 functions= 参数似乎适用于具有 NAs 但不适用于因子变量的数据,而rfFuncs 适用于因子变量,但不适用于 NA.

It seems the functions= argument in rfe() using lmFuncs works for the data with NAs but NOT on factor variables, while the rfFuncs works for factor variables but NOT NAs.

有什么处理这个问题的建议吗?

Any suggestions for dealing with this?

我尝试了 model.matrix() 但它似乎只会引起更多问题.

I tried model.matrix() but it seems to just cause more problems.

推荐答案

由于包之间在这些点上的行为不一致,更不用说在使用 caret 等更多元"包时的额外技巧了,我总是发现在进行任何机器学习之前,先处理 NA 和因子变量更容易.

Because of inconsistent behavior on these points between packages, not to mention the extra trickiness when going to more "meta" packages like caret, I always find it easier to deal with NAs and factor variables up front, before I do any machine learning.

  • 对于 NA,省略或估算(中位数、knn 等).
  • 对于因子特征,您使用 model.matrix() 走在正确的轨道上.它将让您为不同级别的因子生成一系列虚拟"特征.典型的用法是这样的:
  • For NAs, either omit or impute (median, knn, etc.).
  • For factor features, you were on the right track with model.matrix(). It will let you generate a series of "dummy" features for the different levels of the factor. The typical usage is something like this:
> dat = data.frame(x=factor(rep(1:3, each=5)))
> dat$x
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
> model.matrix(~ x - 1, data=dat)
   x1 x2 x3
1   1  0  0
2   1  0  0
3   1  0  0
4   1  0  0
5   1  0  0
6   0  1  0
7   0  1  0
8   0  1  0
9   0  1  0
10  0  1  0
11  0  0  1
12  0  0  1
13  0  0  1
14  0  0  1
15  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"

另外,以防万一你还没有(尽管听起来你有),CRAN 上的 caret 小插曲非常好,并触及其中一些要点.http://cran.r-project.org/web/packages/caret/index.html

Also, just in case you haven't (although it sounds like you have), the caret vignettes on CRAN are very nice and touch on some of these points. http://cran.r-project.org/web/packages/caret/index.html

这篇关于R caret/rfe 变量选择 factor() AND NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆