如何使用R中的lm()函数从回归中删除不重要的因子水平? [英] How do you remove an insignificant factor level from a regression using the lm() function in R?

查看:681
本文介绍了如何使用R中的lm()函数从回归中删除不重要的因子水平?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在R中执行回归并使用类型因子时,它可以帮助我避免在数据中设置分类变量.但是,如何从回归中删除不重要的因素以仅显示重要变量呢?

When I perform a regression in R and use type factor it helps me avoid setting up the categorical variables in the data. But how do I remove a factor that is not significant from the regression to just show significant variables?

例如:

dependent <- c(1:10)
independent1 <- as.factor(c('d','a','a','a','a','a','a','b','b','c'))
independent2 <- c(-0.71,0.30,1.32,0.30,2.78,0.85,-0.25,-1.08,-0.94,1.33)
output <- lm(dependent ~ independent1+independent2)
summary(output)

这将导致以下回归模型:

Which results in the following regression model:

Coefficients:
          Estimate Std. Error t value Pr(>|t|)   
(Intercept)     4.6180     1.0398   4.441  0.00676 **
independent1b   3.7471     2.1477   1.745  0.14148   
independent1c   5.5597     2.0736   2.681  0.04376 * 
independent1d  -3.7129     2.3984  -1.548  0.18230   
independent2   -0.1336     0.7880  -0.170  0.87203   

如果我想撤出无关紧要的独立级别(b,d),有什么办法可以做到?

If I want to pull out the independent1 levels that are insignificant (b,d) is there a way that I can do that?

在这种情况下,将数据设置为具有分类变量很容易,但是当我包括周数或其他级别较高的因素时,这将变得很不方便.

In this case setting up the data to have categorical variables is easy but when I'm including week numbers or another factor with a lot of levels it becomes inconvenient.

这是使用分类变量构建模型的方法.如您所见,最终结构化数据变得更加痛苦,但同时也给了我更多的控制权.

Here is the way to build the model using categorial variables. As you can see it ends up being more of a pain to structure the data but also gives me more control.

regressionData <- data.frame(cbind(1:10,c(-0.71,0.30,1.32,0.30,2.78,0.85,-0.25,-1.08,-0.94,1.33),c(0,1,1,1,1,1,1,0,0,0),c(0,0,0,0,0,0,0,1,1,0),c(0,0,0,0,0,0,0,0,0,1),c(1,0,0,0,0,0,0,0,0,0)))

names(output) = c('dependent','independent2','independenta', 'independentb','independentc','independentd')

attach(regressionData)

result <- lm(dependent~independent2+independentb+independentc+independentd)
summary(result)

现在,我可以删除Independent2,因为它无关紧要

Now I can remove independent2 since it's insignificant

result <- lm(dependent~independentb+independentc+independentd)
summary(result)

由于无关紧要,我将删除独立

I'll remove independentd since it's not significant

result <- lm(dependent~independentb+independentc)
summary(result)

但是在这种情况下,校正R平方下降(我什至不打算进行部分F检验),因为这很重要,但是在很多情况下,这是不正确的,因此我需要从回归,因为它吞噬了在这种情况下很重要的自由度,并可能掩盖了其他重要变量的价值.

But in this case the Adjusted R Squared drops (I'm not even going to do the partial F-test) since it would be significant, but in many cases this is not true and I need to remove the categorical from the regression because it's eating up degrees of freedom which are important in this case and potential masking the value of other variables that are significant.

推荐答案

您可以使用选项exclude删除因子变量的水平:

You can remove the levels of the factor variables using the option exclude:

lm(dependent ~ factor(independent1, exclude=c('b','d')) + independent2)

这样,因子b,d将不会包含在回归中.

This way the factors b, d will not be included in the regression.

欢呼

这篇关于如何使用R中的lm()函数从回归中删除不重要的因子水平?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆