当其中一些是因素时如何预处理特征? [英] How to preProcess features when some of them are factors?
问题描述
我的问题与这个有关 关于使用 Caret 包时的分类数据(R 术语中的因素).我从链接的帖子中了解到,如果您使用公式界面",某些功能可能是因素,并且培训将正常进行.我的问题是如何使用 preProcess()
函数缩放数据?如果我尝试在具有某些列作为因子的数据框中执行此操作,则会收到此错误消息:
My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the "formula interface", some features can be factors and the training will work fine. My question is how can I scale the data with the preProcess()
function? If I try and do it on a data frame with some columns as factors, I get this error message:
Error in preProcess.default(etitanic, method = c("center", "scale")) :
all columns of x must be numeric
在这里查看一些示例代码:
See here some sample code:
library(earth)
data(etitanic)
a <- preProcess(etitanic, method=c("center", "scale"))
b <- predict(etitanic, a)
谢谢.
推荐答案
这与您链接到的帖子实际上是同一个问题.preProcess
仅适用于数字数据,您有:
It is really the same issue as the post you link to. preProcess
works only on numeric data and you have:
> str(etitanic)
'data.frame': 1046 obs. of 6 variables:
$ pclass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
$ survived: int 1 1 0 0 0 1 1 0 1 0 ...
$ sex : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ...
$ age : num 29 0.917 2 30 25 ...
$ sibsp : int 0 1 1 1 1 0 1 0 2 0 ...
$ parch : int 0 2 2 2 2 0 0 0 0 0 ...
您不能按原样居中和缩放 pclass
或 sex
,因此需要将它们转换为虚拟变量.您可以使用 model.matrix
或插入符号的 dummyVars
来执行此操作:
You can't center and scale pclass
or sex
as-is so they need to be converted to dummy variables. You can use model.matrix
or caret's dummyVars
to do this:
> new <- model.matrix(survived ~ . - 1, data = etitanic)
> colnames(new)
[1] "pclass1st" "pclass2nd" "pclass3rd" "sexmale" "age"
[6] "sibsp" "parch"
-1
去掉了拦截.现在你可以在这个对象上运行 preProcess
.
The -1
gets rid of the intercept. Now you can run preProcess
on this object.
顺便说一句,使 preProcess
忽略非数字数据在我的待办事项"列表中,但它可能会导致人们不注意的错误.
btw making preProcess
ignore non-numeric data is on my "to do" list but it might cause errors for people not paying attention.
最大
这篇关于当其中一些是因素时如何预处理特征?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!