R中Logistic回归的虚拟变量 [英] Dummy variables for Logistic regression in R

查看:425
本文介绍了R中Logistic回归的虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对所有都是二进制的三个因素进行逻辑回归.

I am running a logistic regression on three factors that are all binary.

我的数据

   table1<-expand.grid(Crime=factor(c("Shoplifting","Other Theft Acts")),Gender=factor(c("Men","Women")),
    Priorconv=factor(c("N","P")))
    table1<-data.frame(table1,Yes=c(24,52,48,22,17,60,15,4),No=c(1,9,3,2,6,34,6,3))

和模型

fit4<-glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial)
summary(fit4)

R对先前的定罪P似乎取1,对犯罪的商店盗窃取1.结果,如果以上两个条件均为1,则交互作用效果仅为1.我现在想对交互作用项尝试不同的组合,例如,我想看看如果先前的信念为P 并且犯罪不是入店行窃.

R seems to take 1 for prior conviction P and 1 for crime shoplifting. As a result the interaction effect is only 1 if both of the above are 1. I would now like to try different combinations for the interaction term, for example I would like to see what it would be if prior conviction is P and crime is not shoplifting.

有没有一种方法可以使R对1和0采取不同的格?这将大大方便我的分析.

Is there a way to make R take different cases for the 1s and the 0s? It would facilitate my analysis greatly.

谢谢.

推荐答案

您已经在回归中获得了两个分类变量的所有四个组合.您可以看到以下内容:

You're already getting all four combinations of the two categorical variables in your regression. You can see this as follows:

这是您的回归结果:

Call:
glm(formula = cbind(Yes, No) ~ Priorconv + Crime + Priorconv:Crime, 
    family = binomial, data = table1)

Coefficients:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   1.9062     0.3231   5.899 3.66e-09 ***
PriorconvP                   -1.3582     0.3835  -3.542 0.000398 ***
CrimeShoplifting              0.9842     0.6069   1.622 0.104863    
PriorconvP:CrimeShoplifting  -0.5513     0.7249  -0.761 0.446942  

因此,对于Priorconv,参考类别(虚拟值= 0的参考类别)为N.对于Crime,参考类别为Other.因此,这是解释四种可能性中的每种可能性的回归结果的方法(其中log(p/(1-p))是Yes结果的几率的对数):

So, for Priorconv, the reference category (the one with dummy value = 0) is N. And for Crime the reference category is Other. So here's how to interpret the regression results for each of the four possibilities (where log(p/(1-p)) is the log of the odds of a Yes result):

1. PriorConv = N and Crime = Other. This is just the case where both dummies are 
    zero, so your regression is just the intercept:

log(p/(1-p)) = 1.90

2. PriorConv = P and Crime = Other. So the Priorconv dummy equals 1 and the 
   Crime dummy is still zero:

log(p/(1-p)) = 1.90 - 1.36

3. PriorConv = N and Crime = Shoplifting. So the Priorconv dummy is 0 and the 
   Crime dummy is now 1:

log(p/(1-p)) = 1.90 + 0.98

4. PriorConv = P and Crime = Shoplifting. Now both dummies are 1:

log(p/(1-p)) = 1.90 - 1.36 + 0.98 - 0.55

您可以对两个预测变量的因子值进行重新排序,但这只会更改属于上述四种情况的变量组合.

You can reorder the factor values of the two predictor variables, but that will just change which combinations of variables fall into each of the four cases above.

更新:关于回归系数与因子排序的关系.更改参考水平将更改系数,因为系数将代表类别的不同组合之间的对比,但不会更改YesNo结果的预测概率. (如果仅通过更改参考类别来更改预测,回归建模就不会那么可靠.)请注意,例如,即使我们将参考类别切换为Priorconv,预测的概率也相同:

Update: Regarding the issue of regression coefficients relative to ordering of the factors. Changing the reference level will change the coefficients, because the coefficients will represent contrasts between different combinations of categories, but it won't change the predicted probabilities of a Yes or No outcome. (Regression modeling wouldn't be all that credible if you could change the predictions just by changing the reference category.) Note, for example, that the predicted probabilities are the same even if we switch the reference category for Priorconv:

m1 = glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table1,family=binomial)
predict(m1, type="response")

1         2         3         4         5         6         7         8 
0.9473684 0.8705882 0.9473684 0.8705882 0.7272727 0.6336634 0.7272727 0.6336634 

table2 = table1
table2$Priorconv = relevel(table2$Priorconv, ref = "P")

m2 = glm(cbind(Yes,No)~Priorconv+Crime+Priorconv:Crime,data=table2,family=binomial)
predict(m2, type="response")

1         2         3         4         5         6         7         8 
0.9473684 0.8705882 0.9473684 0.8705882 0.7272727 0.6336634 0.7272727 0.6336634 

这篇关于R中Logistic回归的虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆