ggplot2中来自glm和stat_smooth的逻辑回归的预测值是不同的 [英] Predicted values for logistic regression from glm and stat_smooth in ggplot2 are different

查看:413
本文介绍了ggplot2中来自glm和stat_smooth的逻辑回归的预测值是不同的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 ggplot2 中制作逻辑回归图。

  df < - 结构(列表(y = c(2L,7L,776L,19L,12L,26L,7L,12L,8L,
24L,20L,16L,12L,10L,23L,20L ,16L,12L,18L,22L,23L,22L,
13L,7L,20L,12L,13L,11L,11L,14L,10L,8L,10L,11L,5L,
5L, 2L,1L,1L,0L,0L,0L),n = c(3L,7L,789L,20L,14L,
27L,7L,13L,9L,29L,22L,17L,14L, 30L,21L,19L,14L,
22L,29L,28L,28L,19L,10L,27L,22L,18L,18L,14L,23L,18L,
12L,19L,15L, 9L,7L,3L,1L,1L,1L,1L,1L),x = c(18L,
19L,20L,21L,22L,23L,24L,25L,26L,27L,28L,29L,30L 31L
32L 33L 34L 35L 36L 37L 38L 39L 40L 41L 42L 43L 44L $ b 45L 46L 47L 48L 49L 50L 51L ,52L,53L,54L,55L,56L,59L,
62L,63L,66L)),.Names = c(y,n,x),class =data.frame ,row.names = c(NA,
-43L))


mod.fit < - glm(formula = y / n〜x,data = df,weight = n,家庭=二项式(链接= $ log $),
na.action = na.exclude,control = list(epsilon = 0.0001,maxit = 50,trace = T))
summary(mod.fit)

(p 1 <-c(0.25,0.5,0.75)
LD < - (log(P 1 /(1 -P 1)] - mod.fit$coefficients [1])/ mod.fit$coefficients [2 ]
LD.summary< - data.frame(Pi,LD)
LD.summary


plot(df $ x,df $ y / df $ n,xlab =x,ylab =估计概率)

lin.pred < - 预测(mod.fit)
pi.hat< - exp(lin.pred )/(1 + exp(lin.pred))
lines(df $ x,pi.hat,lty = 1,col =red)


segments( x0 = LD.summary $ LD,y0 = -0.1,x1 = LD.summary $ LD,y1 = LD.summary $ Pi,
lty = 2,col = c(darkblue,darkred, ))
segments(x0 = 15,y0 = LD.summary $ Pi,x1 = LD.summary $ LD,y1 = LD.summary $ Pi,
lty = 2,col = c darkblue,darkred,darkgreen))
legend(bottomleft,legend = c(LD25,LD50,LD75),lty = 2,col = c(darkblue ,darkred,darkgreen),bty =n,cex = 0.75)


这是我用 ggplot2

  library(ggplot2)

p< - ggplot(data = df,aes(x = x,y = y / n))+
geom_point()+
stat_smooth(method =glm,family =binomial)

p < - p + geom_segment(aes(
x = LD.summary $ LD
,y = 0
,xend = LD.summary $ LD
,yend = LD.summary $ Pi

,color =red


p < - p + geom_segment(aes(
x = 0
,y = LD.summary $ Pi
,xend = LD.summary $ LD
,yend = LD.summary $ Pi

,color =re d


print(p)



问题




  1. glm stat_smooth 的预测值看起来不同。这两种方法会产生不同的结果,或者我在这里错过了一些东西。

  2. 我的ggplot2图形不完全基于R图形。

  3. 如何在ggplot2中为线段使用不同的颜色?

  4. 以及如何将图例置于ggplot2中?

预先感谢您的帮助和时间。谢谢

解决方案

只是对mathetmatical.coffee的答案做了一些小修改。通常情况下, geom_smooth 不应该取代实际的建模,这就是为什么当你想要使用从<$ c $获得的特定输出时看起来不方便c> glm 等等。但实际上,我们需要做的就是将拟合值添加到数据框中:

  df $ pred <-pi。 ('LD25','LD50','LD75')

ggplot(df,aes(x = x,y = y / n ))+
geom_point()+
geom_line(aes(y = pred),color =black)+
geom_segment(data = LD.summary,aes(y = Pi,
xend = LD,
yend = Pi,
col = group),x = -Inf,linetype =dashed)+
geom_segment(data = LD.summary,aes x = LD,
xend = LD,
yend = Pi,
col = group),y = -Inf,linetype =dashed)



最后一个小技巧是使用 Inf -I



这里的教训是,如果你想要做的只是添加一个平滑的情节,并且情节中没有其他东西依赖于它,使用 geom_smooth 。如果你想引用拟合模型的输出,它通常更容易适应 ggplot 之外的模型,然后绘图。


I'm trying to make this logistic regression graph in ggplot2.

df <- structure(list(y = c(2L, 7L, 776L, 19L, 12L, 26L, 7L, 12L, 8L,
24L, 20L, 16L, 12L, 10L, 23L, 20L, 16L, 12L, 18L, 22L, 23L, 22L,
13L, 7L, 20L, 12L, 13L, 11L, 11L, 14L, 10L, 8L, 10L, 11L, 5L,
5L, 1L, 2L, 1L, 1L, 0L, 0L, 0L), n = c(3L, 7L, 789L, 20L, 14L,
27L, 7L, 13L, 9L, 29L, 22L, 17L, 14L, 11L, 30L, 21L, 19L, 14L,
22L, 29L, 28L, 28L, 19L, 10L, 27L, 22L, 18L, 18L, 14L, 23L, 18L,
12L, 19L, 15L, 13L, 9L, 7L, 3L, 1L, 1L, 1L, 1L, 1L), x = c(18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L,
32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L,
45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 59L,
62L, 63L, 66L)), .Names = c("y", "n", "x"), class = "data.frame", row.names = c(NA,
-43L))


mod.fit <- glm(formula = y/n ~ x, data = df, weight=n, family = binomial(link = logit),
        na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = T))
summary(mod.fit)

Pi <- c(0.25, 0.5, 0.75)
LD <- (log(Pi /(1-Pi))-mod.fit$coefficients[1])/mod.fit$coefficients[2]
LD.summary <- data.frame(Pi , LD)
LD.summary


plot(df$x, df$y/df$n, xlab = "x", ylab = "Estimated probability")

lin.pred <- predict(mod.fit)
pi.hat <- exp(lin.pred)/(1 + exp(lin.pred))
lines(df$x, pi.hat, lty = 1, col = "red")


segments(x0 = LD.summary$LD, y0 = -0.1, x1 = LD.summary$LD, y1 = LD.summary$Pi,
         lty=2, col=c("darkblue","darkred","darkgreen"))
segments(x0 = 15, y0 = LD.summary$Pi, x1 = LD.summary$LD, y1 = LD.summary$Pi,
         lty=2, col=c("darkblue","darkred","darkgreen"))
legend("bottomleft", legend=c("LD25", "LD50", "LD75"), lty=2, col=c("darkblue","darkred","darkgreen"), bty="n", cex=0.75)

Here is my attempt with ggplot2

library(ggplot2)

p <- ggplot(data = df, aes(x = x, y = y/n)) +
            geom_point() +
            stat_smooth(method = "glm", family = "binomial")

p <- p + geom_segment(aes(
                            x = LD.summary$LD
                          , y = 0
                          , xend = LD.summary$LD
                          , yend = LD.summary$Pi
                         )
                         , colour="red"
                       )

p <- p + geom_segment(aes(
                            x = 0
                          , y = LD.summary$Pi
                          , xend = LD.summary$LD
                          , yend = LD.summary$Pi
                         )
                         , colour="red"
                       )

print(p)

Questions

  1. Predicted values for glm and stat_smooth look different. Are these two methods produces different results or I'm missing something here.
  2. My ggplot2 graph is not exactly as base R graph.
  3. How to use different colours for line segments in ggplot2?
  4. And how to put legend in ggplot2?

Thanks in advance for your help and time. Thanks

解决方案

Just a couple of minor additions to @mathetmatical.coffee's answer. Typically, geom_smooth isn't supposed to replace actual modeling, which is why it can seem inconvenient at times when you want to use specific output you'd get from glm and such. But really, all we need to do is add the fitted values to our data frame:

df$pred <- pi.hat
LD.summary$group <- c('LD25','LD50','LD75')

ggplot(df,aes(x = x, y = y/n)) + 
    geom_point() + 
    geom_line(aes(y = pred),colour = "black") + 
    geom_segment(data=LD.summary, aes(y = Pi,
                                      xend = LD,
                                      yend = Pi,
                                      col = group),x = -Inf,linetype = "dashed") + 
    geom_segment(data=LD.summary,aes(x = LD,
                                     xend = LD,
                                     yend = Pi,
                                     col = group),y = -Inf,linetype = "dashed")

The final little trick is the use of Inf and -Inf to get the dashed lines to extend all the way to the plot boundaries.

The lesson here is that if all you want to do is add a smooth to a plot, and nothing else in the plot depends on it, use geom_smooth. If you want to refer to the output from the fitted model, its generally easier to fit the model outside ggplot and then plot.

这篇关于ggplot2中来自glm和stat_smooth的逻辑回归的预测值是不同的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆