为什么我得到“算法没有收敛"?和“在数字上拟合概率为 0 或 1";glm 警告? [英] Why am I getting "algorithm did not converge" and "fitted prob numerically 0 or 1" warnings with glm?

查看:52
本文介绍了为什么我得到“算法没有收敛"?和“在数字上拟合概率为 0 或 1";glm 警告?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以这是一个非常简单的问题,只是似乎无法弄清楚.

So this is a very simple question, just can't seem to figure it out.

我正在使用 glm 函数运行 logit,但不断收到与自变量相关的警告消息.它们存储为因子,我已将它们更改为数字,但没有运气.我也将它们编码为 0/1,但这也不起作用.

I'm running a logit using the glm function, but keep getting warning messages relating to the independent variable. They're stored as factors and I've changed them to numeric but had no luck. I also coded them to 0/1 but that did not work either.

请帮忙!

> mod2 <- glm(winorlose1 ~ bid1, family="binomial")
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

我也在 Zelig 中尝试过,但类似的错误:

I also tried it in Zelig, but similar error:

> mod2 = zelig(factor(winorlose1) ~ bid1, data=dat, model="logit")
How to cite this model in Zelig:
Kosuke Imai, Gary King, and Oliva Lau. 2008. "logit: Logistic Regression for Dichotomous Dependent Variables" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software," http://gking.harvard.edu/zelig
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

> str(dat)
'data.frame':   3493 obs. of  3 variables:
 $ winorlose1: int  2 2 2 2 2 2 2 2 2 2 ...
 $ bid1      : int  700 300 700 300 500 300 300 700 300 300 ...
 $ home      : int  1 0 1 0 0 0 0 1 0 0 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:63021] 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 ...
  .. ..- attr(*, "names")= chr [1:63021] "3494" "3495" "3496" "3497" ...

推荐答案

如果您查看 ?glm(甚至在 Google 上搜索第二条警告消息),您可能会从文档:

If you look at ?glm (or even do a Google search for your second warning message) you may stumble across this from the documentation:

有关二项式 GLM 的出现数值为 0 或 1 的拟合概率"的警告消息的背景,请参阅 Venables &里普利(2002 年,第 197-8 页).

For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for binomial GLMs, see Venables & Ripley (2002, pp. 197–8).

现在,不是每个人都有那本书.但假设我这样做是犹太洁食,这里是相关的段落:

Now, not everyone has that book. But assuming it's kosher for me to do this, here's the relevant passage:

有一种相当普遍的情况,即两者收敛问题和 Hauck-Donner 现象可能会发生.这是当拟合概率非常接近于零或一.考虑一个具有数千个病例和大约 50 个二进制的医学诊断问题解释变量(可能是由于编码较少的类别变量);这些指标之一很少是真的,但总是表明该病存在.那么拟合概率具有该指标的案例应该是一个,这只能实现通过取 βi = ∞.glm 的结果将是警告和大约 +/- 10 的估计系数.在统计文献中对此进行了相当广泛的讨论,通常声称不存在最大似然估计;看索特纳和达菲(1989 年,第 234 页).

There is one fairly common circumstance in which both convergence problems and the Hauck-Donner phenomenon can occur. This is when the fitted probabilities are extremely close to zero or one. Consider a medical diagnosis problem with thousands of cases and around 50 binary explanatory variable (which may arise from coding fewer categorical variables); one of these indicators is rarely true but always indicates that the disease is present. Then the fitted probabilities of cases with that indicator should be one, which can only be achieved by taking βi = ∞. The result from glm will be warnings and an estimated coefficient of around +/- 10. There has been fairly extensive discussion of this in the statistical literature, usually claiming non-existence of maximum likelihood estimates; see Sautner and Duffy (1989, p. 234).

本书的一位作者的评论更为详细这里.所以这里的教训是仔细查看预测变量的水平之一.(还有谷歌警告信息!)

One of the authors of this book commented in somewhat more detail here. So the lesson here is to look carefully at one of the levels of your predictor. (And Google the warning message!)

这篇关于为什么我得到“算法没有收敛"?和“在数字上拟合概率为 0 或 1";glm 警告?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆