R二项式GLM中的准分离是否重要? [英] Does Quasi Separation matter in R binomial GLM?

查看：178 发布时间：2020/5/4 3:17:19 r glm logistic-regression mle

本文介绍了R二项式GLM中的准分离是否重要?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在学习准分离如何影响R二项式GLM.我开始认为在某些情况下这没关系.

I am learning how the quasi-separation affects R binomial GLM. And I start to think that it does not matter in some circumstance.

据我了解，我们说数据在某些因素水平的线性组合可以完全识别故障/非故障.

In my understanding, we say that the data has quasi separation when some linear combination of factor levels can completely identify failure/non-failure.

所以我创建了一个人工数据集，其中在R中的准分隔为:

So I created an artificial dataset with a quasi separation in R as:

fail <- c(100,100,100,100)
nofail <- c(100,100,0,100)
x1 <- c(1,0,1,0)
x2 <- c(0,0,1,1)
data <- data.frame(fail,nofail,x1,x2)
rownames(data) <- paste("obs",1:4)

然后，当x1 = 1和x2 = 1(obs 3)时，数据始终不会失败. 在此数据中，我的协变量矩阵包含三列:intercept，x1和x2.

Then when x1=1 and x2=1 (obs 3) the data always doesn't fail. In this data, my covariate matrix has three columns: intercept, x1 and x2.

据我了解，准分离导致无穷大的估计.所以glm fit应该会失败.但是，以下glm fit不会失败:

In my understanding, quasi-separation results in estimate of infinite value. So glm fit should fail. However, the following glm fit does NOT fail:

summary(glm(cbind(fail,nofail)~x1+x2,data=data,family=binomial))

结果是:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.4342     0.1318  -3.294 0.000986 ***
x1            0.8684     0.1660   5.231 1.69e-07 ***
x2            0.8684     0.1660   5.231 1.69e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

标准.即使是准分离，错误似乎也很合理. 谁能告诉我为什么准分离不影响glm fit结果吗?

Std. Error seems very reasonable even with the quasi separation. Could anyone tell me why the quasi separation is NOT affecting the glm fit result?

推荐答案

您已经构造了一个有趣的示例，但您并未测试实际检查您描述为准分离的情况的模型.当您说:当x1 = 1和x2 = 1(obs 3)时，数据总是失败."，这意味着模型中需要交互项.请注意，这会产生更有趣"的结果:

You have constructed an interesting example but you are not testing a model that actually examines the situation that you are describing as quasi-separation. When you say: "when x1=1 and x2=1 (obs 3) the data always fails.", you are implying the need for an interaction term in the model. Notice that this produces a "more interesting" result:

> summary(glm(cbind(fail,nofail)~x1*x2,data=data,family=binomial))

Call:
glm(formula = cbind(fail, nofail) ~ x1 * x2, family = binomial, 
    data = data)

Deviance Residuals: 
[1]  0  0  0  0

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.367e-17  1.414e-01   0.000        1
x1           2.675e-17  2.000e-01   0.000        1
x2           2.965e-17  2.000e-01   0.000        1
x1:x2        2.731e+01  5.169e+04   0.001        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.2429e+02  on 3  degrees of freedom
Residual deviance: 2.7538e-10  on 0  degrees of freedom
AIC: 25.257

Number of Fisher Scoring iterations: 22

通常需要非常怀疑β系数为2.731e + 01:隐式优势比i:

One generally needs to be very suspect of beta coefficients of 2.731e+01: The implicit odds ratio i:

 > exp(2.731e+01)
[1] 725407933166

在这种工作环境中，Inf与725,407,933,166之间确实没有实质性区别.

In this working environment there really is no material difference between Inf and 725,407,933,166.

这篇关于R二项式GLM中的准分离是否重要?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R二项式GLM中的准分离是否重要? [英] Does Quasi Separation matter in R binomial GLM?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R二项式GLM中的准分离是否重要? [英] Does Quasi Separation matter in R binomial GLM?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭