生成伪变量后? [英] After generating dummy variables?

查看:108
本文介绍了生成伪变量后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将类别变量更改为虚拟变量. 季节",假日",工作日",天气",温度",温度",湿度",风速",已注册",计数",小时",降落"都是变量.

I am trying to change the category variables into dummy variables. "season","holiday","workingday","weather","temp","atemp","humidity","windspeed", "registered","count","hour","dow" are all variables.

这是我的代码:

#dummy
library(dummies)
#set up new dummy variables
data.new = data.frame(data)
data.new = cbind(data.new,dummy(data.new$season, sep = "_"))
data.new = cbind(data.new,dummy(data.new$holiday, sep = "_"))
data.new = cbind(data.new,dummy(data.new$weather, sep = "_"))
data.new = cbind(data.new,dummy(data.new$dow, sep = "_"))
data.new = cbind(data.new,dummy(data.new$hour, sep = "_"))
data.new = cbind(data.new,dummy(data.new$workingday, sep = "_"))
#delete the old variables
data.new = data.new[,-1]
data.new = data.new[,-1]
data.new = data.new[,-2]
data.new = data.new[,-8]
data.new = data.new[,-8]
data.new = data.new[,-1]

生成虚拟变量后,我应该删除旧变量吗? 如果我想进行PCR,可以使用所有变量,例如

Should I delete the old variables after generating the dummy variables? If I want to do PCR, may I use all variables, e.g.

fit = pcr(count~.,data = data.new) 

生成线性回归模型? 还是应该只使用非虚拟变量?

to generate a linear regression model? Or should I just use the not dummy variables?

fit = pcr(count~temp+atemp+humidity+windspeed+registered,data = data.new)

很抱歉造成您的误会.我以lm函数为例.现在,我将其更改为pcr函数. 感谢您阅读此问题!

Sorry to cause your misunderstanding. I used lm function as an example. Now I have changed it into pcr function. Thank you for reading this question!

推荐答案

只要类别变量是因子,lm函数将为您处理虚拟变量的创建.

As long as your categorical variables are factors, the lm function will handle the creation of dummy variables for you.

我建议您首先验证您的数据是data.frame,并且确实是预测变量.

I would recommend you first verify that your data is a data.frame and the predictors that are categorical are indeed factors.

class(data)
sapply(data, class)

或更简单地

str(data)

然后,只需在lm调用中将它们放在您的公式中即可.

Then, simply put them in your formula in your lm call.

fit = lm(count ~ season + holiday + workingday + weather + temp + atemp + humidity + windspeed + registered + hour + dow, data=data)

或者如果公式中的列是data.frame中唯一的列,则可以使用简写.

Or if the columns in the formula are the only ones in your data.frame then you can use the short-hand.

fit = lm(count ~ ., data=data)

这篇关于生成伪变量后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆