虚拟变量和预处理 [英] Dummy variables and preProcess
问题描述
我有一个包含一些虚拟变量的数据框,我想将其用作 glmnet
的训练集.
I have a data frame with some dummy variables that I want to use as training set for glmnet
.
由于我使用的是 glmnet
,我想使用插入符号 train
函数中的 preProcess
选项来居中和缩放特征.我不希望这种转换也应用于虚拟变量.
Since I'm using glmnet
I want to center and scale the features using the preProcess
option in the caret train
function. I don't want that this transformation is applied also to the dummy variables.
有没有办法阻止这些变量的转换?
Is there a way to prevent the transformation of these variables?
推荐答案
除了编写 自定义模型 这样做(参见接近尾声的 PLS 和 RF 示例).
There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end).
我正在研究一种方法来指定哪些变量获得哪种预处理方法.但是,对于虚拟变量,这很困难,因为您可能需要指定许多列不在当前数据集中的预测变量的名称.我们的想法是能够使用通配符(例如 Species*
来捕获 Speciesversicolor
和 Speciesvirginica
),但代码还没有完成.
I'm working on a method to specify which variables get which pre-processing method. However, with dummy variables, this is tough since you might need to specific the names of a lot of predictors whose columns are not in the current dat set. The idea is to be able to use wildcards (e.g. Species*
to capture Speciesversicolor
and Speciesvirginica
) but the code isn't quite there yet.
最大
这篇关于虚拟变量和预处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!