R二项式GLM中的准分离是否重要? [英] Does Quasi Separation matter in R binomial GLM?

查看:178
本文介绍了R二项式GLM中的准分离是否重要?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习准分离如何影响R二项式GLM.我开始认为在某些情况下这没关系.

I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance.

据我了解,我们说数据在 某些因素水平的线性组合可以完全识别故障/非故障.

In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure.

所以我创建了一个人工数据集,其中在R中的准分隔为:

So I created an artificial dataset with a quasi separation in R as:

fail <- c(100,100,100,100)
nofail <- c(100,100,0,100)
x1 <- c(1,0,1,0)
x2 <- c(0,0,1,1)
data <- data.frame(fail,nofail,x1,x2)
rownames(data) <- paste("obs",1:4)

然后,当x1 = 1和x2 = 1(obs 3)时,数据始终不会失败. 在此数据中,我的协变量矩阵包含三列:intercept,x1和x2.

Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my covariate matrix has three columns: intercept, x1 and x2.

据我了解,准分离导致无穷大的估计.所以glm fit应该会失败.但是,以下glm fit不会失败:

In my understanding, quasi-separation results in estimate of infinite value. So glm fit should fail. However, the following glm fit does NOT fail:

summary(glm(cbind(fail,nofail)~x1+x2,data=data,family=binomial))

结果是:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.4342     0.1318  -3.294 0.000986 ***
x1            0.8684     0.1660   5.231 1.69e-07 ***
x2            0.8684     0.1660   5.231 1.69e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

标准.即使是准分离,错误似乎也很合理. 谁能告诉我为什么准分离不影响glm fit结果吗?

Std. Error seems very reasonable even with the quasi separation. Could anyone tell me why the quasi separation is NOT affecting the glm fit result?

推荐答案

您已经构造了一个有趣的示例,但您并未测试实际检查您描述为准分离的情况的模型.当您说:当x1 = 1和x2 = 1(obs 3)时,数据总是失败.",这意味着模型中需要交互项.请注意,这会产生更有趣"的结果:

You have constructed an interesting example but you are not testing a model that actually examines the situation that you are describing as quasi-separation. When you say: "when x1=1 and x2=1 (obs 3) the data always fails.", you are implying the need for an interaction term in the model. Notice that this produces a "more interesting" result:

> summary(glm(cbind(fail,nofail)~x1*x2,data=data,family=binomial))

Call:
glm(formula = cbind(fail, nofail) ~ x1 * x2, family = binomial, 
    data = data)

Deviance Residuals: 
[1]  0  0  0  0

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.367e-17  1.414e-01   0.000        1
x1           2.675e-17  2.000e-01   0.000        1
x2           2.965e-17  2.000e-01   0.000        1
x1:x2        2.731e+01  5.169e+04   0.001        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.2429e+02  on 3  degrees of freedom
Residual deviance: 2.7538e-10  on 0  degrees of freedom
AIC: 25.257

Number of Fisher Scoring iterations: 22

通常需要非常怀疑β系数为2.731e + 01:隐式优势比i:

One generally needs to be very suspect of beta coefficients of 2.731e+01: The implicit odds ratio i:

 > exp(2.731e+01)
[1] 725407933166

在这种工作环境中,Inf与725,407,933,166之间确实没有实质性区别.

In this working environment there really is no material difference between Inf and 725,407,933,166.

这篇关于R二项式GLM中的准分离是否重要?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆