逻辑回归预测的置信区间 [英] Confidence intervals for predictions from logistic regression

查看:588
本文介绍了逻辑回归预测的置信区间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R预测中,lm根据线性回归的结果计算预测,还提供计算这些预测的置信区间的功能.根据手册,这些间隔基于拟合的误差方差,而不是系数的误差间隔.

In R predict.lm computes predictions based on the results from linear regression and also offers to compute confidence intervals for these predictions. According to the manual, these intervals are based on the error variance of fitting, but not on the error intervals of the coefficient.

另一方面,基于logistic和Poisson回归(除其他几个之外)计算预测的预测.glm没有置信区间的选项.而且,我什至很难想象如何计算这样的置信区间以为泊松和逻辑回归提供有意义的见解.

On the other hand predict.glm which computes predictions based on logistic and Poisson regression (amongst a few others) doesn't have an option for confidence intervals. And I even have a hard time imagining how such confidence intervals could be computed to provide a meaningful insight for Poisson and logistic regression.

在某些情况下为此类预测提供置信区间有意义吗?如何解释它们?在这些情况下的假设是什么?

Are there cases in which it is meaningful to provide confidence intervals for such predictions? How can they be interpreted? And what are the assumptions in these cases?

推荐答案

通常的方法是在线性预测变量的尺度上计算置信区间,在此情况下,事情将更加正常(高斯),然后应用链接功能,将线性预测变量量表的置信区间映射到响应量表.

The usual way is to compute a confidence interval on the scale of the linear predictor, where things will be more normal (Gaussian) and then apply the inverse of the link function to map the confidence interval from the linear predictor scale to the response scale.

要做到这一点,您需要做两件事;

To do this you need two things;

  1. type = "link"调用predict(),然后
  2. se.fit = TRUE调用predict().
  1. call predict() with type = "link", and
  2. call predict() with se.fit = TRUE.

第一个生成线性预测变量范围内的预测,第二个返回预测的标准误差.用伪代码

The first produces predictions on the scale of the linear predictor, the second returns the standard errors of the predictions. In pseudo code

## foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y") ## Working example data
mod <- glm(y ~ x, data = foo, family = binomial)
preddata <- with(foo, data.frame(x = seq(min(x), max(x), length = 100)))
preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE)

然后

preds是包含fitse.fit组件的列表.

preds is then a list with components fit and se.fit.

那么线性预测变量的置信区间为

The confidence interval on the linear predictor is then

critval <- 1.96 ## approx 95% CI
upr <- preds$fit + (critval * preds$se.fit)
lwr <- preds$fit - (critval * preds$se.fit)
fit <- preds$fit

critval是根据需要从 t z (正态)分布中选择的(我现在确切地忘记了哪种类型的GLM以及具有哪些属性是)与所需的覆盖范围. 1.96是给出95%覆盖率的高斯分布的值:

critval is chosen from a t or z (normal) distribution as required (I forget exactly now which to use for which type of GLM and what the properties are) with the coverage required. The 1.96 is the value of the Gaussian distribution giving 95% coverage:

> qnorm(0.975) ## 0.975 as this is upper tail, 2.5% also in lower tail
[1] 1.959964

现在,对于fituprlwr,我们需要将链接函数的逆函数应用于它们.

Now for fit, upr and lwr we need to apply the inverse of the link function to them.

fit2 <- mod$family$linkinv(fit)
upr2 <- mod$family$linkinv(upr)
lwr2 <- mod$family$linkinv(lwr)

现在您可以绘制所有三个数据.

Now you can plot all three and the data.

preddata$lwr <- lwr2 
preddata$upr <- upr2 
ggplot(data=foo, mapping=aes(x=x,y=y)) + geom_point() +         
   stat_smooth(method="glm", method.args=list(family=binomial)) + 
   geom_line(data=preddata, mapping=aes(x=x, y=upr), col="red") + 
   geom_line(data=preddata, mapping=aes(x=x, y=lwr), col="red") 

这篇关于逻辑回归预测的置信区间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆