虚拟变量和预处理 [英] Dummy variables and preProcess

查看:34
本文介绍了虚拟变量和预处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些虚拟变量的数据框,我想将其用作 glmnet 的训练集.

I have a data frame with some dummy variables that I want to use as training set for glmnet.

由于我使用的是 glmnet,我想使用插入符号 train 函数中的 preProcess 选项来居中和缩放特征.我不希望这种转换也应用于虚拟变量.

Since I'm using glmnet I want to center and scale the features using the preProcess option in the caret train function. I don't want that this transformation is applied also to the dummy variables.

有没有办法阻止这些变量的转换?

Is there a way to prevent the transformation of these variables?

推荐答案

除了编写 自定义模型 这样做(参见接近尾声的 PLS 和 RF 示例).

There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end).

我正在研究一种方法来指定哪些变量获得哪种预处理方法.但是,对于虚拟变量,这很困难,因为您可能需要指定许多列不在当前数据集中的预测变量的名称.我们的想法是能够使用通配符(例如 Species* 来捕获 SpeciesversicolorSpeciesvirginica),但代码还没有完成.

I'm working on a method to specify which variables get which pre-processing method. However, with dummy variables, this is tough since you might need to specific the names of a lot of predictors whose columns are not in the current dat set. The idea is to be able to use wildcards (e.g. Species* to capture Speciesversicolor and Speciesvirginica) but the code isn't quite there yet.

最大

这篇关于虚拟变量和预处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆