在geom_smooth gam中为每条线调整r平方值 [英] Getting adjusted r-squared value for each line in a geom_smooth gam

查看:611
本文介绍了在geom_smooth gam中为每条线调整r平方值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用ggplot2制作了下图。

  PlotEchi = ggplot(data = Echinoidea,
aes(x =年份,y =平均值,group = aspect, linetype = aspect,shape = aspect))+
geom_errorbar(aes(ymin = mean-se,ymax = mean + se),width = .025,position = pd)+
geom_point(position = pd ,size = 2)+
geom_smooth(method =gam,公式= y〜s(x,k = 3),se = F,size = 0.5,color =black)+
xlab()+
ylab(丰度(平均值+/- SE))+
facet_wrap(〜species,scales =free,ncol = 1)+
scale_y_continuous限制= c(min(y = 0),max(echinoidea $ mean + echinoidea $ se)))+
scale_x_continuous(限制= c(min(Echinoidea $年-0.125),max(Echinoidea $年+ 0.125 )))b


$ b

.jpgrel =nofollow noreferrer>

我想要做的是轻松地检索每条拟合线的调整R平方,而不需要执行 mgcv :: gam 为每个绘制的线使用 model< -gam(df,formula = y〜s(x1)....)。有任何想法吗?

解决方案

这不是真的可行,因为ggplot2会抛出拟合对象。你可以在这里看到



1。通过修补ggplot2解决问题



一个丑陋的解决方法是即时修补ggplot2代码以打印出结果。你可以这样做,如下所示。最初的任务会抛出一个错误,但事情无论如何工作。要取消此操作,请重新启动R会话。

  library(ggplot2)

#assignInNamespace patches`predictdf .glm`,并添加
#这一行打印模型的摘要。出于某种原因,这
#会产生一个错误,但事情仍然有效。
assignInNamespace(predictdf.glm,function(model,xseq,se,level){
pred < - stats :: predict(model,newdata = data.frame(x = xseq),se如果(se){.fit = se,
type =link)

print(summary(model))#这是我添加

的行
std < - stats :: qnorm(level / 2 + 0.5)
data.frame(
x = xseq,
y = model $ family $ linkinv(as.vector(pred $ fit)),
ymin = model $ family $ linkinv(as.vector(pred $ fit - std * pred $ se.fit)),
ymax = model $ family $ linkinv(as.vector (pred $ fit + std * pred $ se.fit)),
se = as.vector(pred $ se.fit)

} else {
data.frame (x = xseq,y = model $ family $ linkinv(as.vector(pred)))
}
},ggplot2)

现在我们可以使用修补过的ggplot2绘制一个图:

  ggplot(iris,aes(Sepal.Length,Sepal.Width,color = Species))+ 
geom_point()+ geom_smooth(se = F,method =gam,公式= y〜s(x,bs =cs))



控制台输出:

 高斯
链接函数:标识

公式:
y〜s(x,bs =cs)

参数系数:
估计标准。错误t值Pr(> | t |)
(截距)3.4280 0.0365 93.91 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.546 9 5.947 5.64e-11 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

R-sq。(adj)= 0.536偏差解释= 55.1%
GCV = 0.070196 Scale est。= 0.066622 n = 50

系列:高斯
链接功能:身份

公式:
y〜s(x, bs =cs)

参数系数:
估计标准。错误t值Pr(> | t |)
(截距)2.77000 0.03797 72.96 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.564 9 1.961 8.42e-05 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

R-sq。(adj)= 0.268偏差解释= 29.1%
GCV = 0.075969 Scale est。= 0.072074 n = 50

系列:高斯
链接功能:身份

公式:
y〜s(x, bs =cs)

参数系数:
估计标准。错误t值Pr(> | t |)
(拦截)2.97400 0.04102 72.5 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.279 9 1.229 0.001 **
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1

R-sq。(adj)= 0.191偏差解释= 21.2%
GCV = 0.088147 Scale est。= 0.08413 n = 50

注意:我不建议这种方法。

2。通过tidyverse拟合模型解决问题



我认为最好单独运行模型。这样做很容易使用潮汐和扫帚,所以我不确定你为什么不想这样做。

 库(tidyverse)
库(扫帚)
虹膜%>%巢( - 物种)%>%
mutate(fit = map(data,〜mgcv :: gam(Sepal。宽度〜s(Sepal.Length,bs =cs),data =。)),
results = map(fit,glance),
R.square = map_dbl(fit,〜summary(。 )$%
unnest(results)%>%
select(-data,-fit)

#物种R.square df logLik AIC BIC偏差df.residual
#1 setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460 47.45399
#2杂色0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909 47.43638
#3维吉尼卡0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793 47.72143

正如您所看到的,提取的R平方值在两种情况下完全相同。 p>

I produced the below graph using ggplot2.

PlotEchi = ggplot(data=Echinoidea, 
                  aes(x=Year, y=mean, group = aspect, linetype = aspect, shape=aspect)) + 
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.025, position=pd) + 
  geom_point(position=pd, size=2) + 
  geom_smooth(method = "gam", formula = y~s(x, k=3), se=F, size = 0.5,colour="black") + 
  xlab("") + 
  ylab("Abundance (mean +/- SE)") + 
  facet_wrap(~ species, scales = "free", ncol=1) + 
  scale_y_continuous(limits=c(min(y=0), max(Echinoidea$mean+Echinoidea$se))) + 
  scale_x_continuous(limits=c(min(Echinoidea$Year-0.125), max(Echinoidea$Year+0.125)))

What I would like to do is easily retrieve the adjusted R-square for each of the fitted lines without doing an individual mgcv::gam for each plotted line using model<-gam(df, formula = y~s(x1)....). Any ideas?

解决方案

This is not really possible, because ggplot2 throws away the fitted object. You can see this in the source here.

1. Solving the problem by patching ggplot2

One ugly workaround is to patch the ggplot2 code on the fly to print out the results. You can do this as follows. The initial assignment throws an error but things work anyways. To undo this just restart your R session.

library(ggplot2)

# assignInNamespace patches `predictdf.glm` from ggplot2 and adds 
# a line that prints the summary of the model. For some reason, this
# creates an error, but things work nonetheless.
assignInNamespace("predictdf.glm", function(model, xseq, se, level) {
  pred <- stats::predict(model, newdata = data.frame(x = xseq), se.fit = se,
                         type = "link")

  print(summary(model)) # this is the line I added

  if (se) {
    std <- stats::qnorm(level / 2 + 0.5)
    data.frame(
      x = xseq,
      y = model$family$linkinv(as.vector(pred$fit)),
      ymin = model$family$linkinv(as.vector(pred$fit - std * pred$se.fit)),
      ymax = model$family$linkinv(as.vector(pred$fit + std * pred$se.fit)),
      se = as.vector(pred$se.fit)
    )
  } else {
    data.frame(x = xseq, y = model$family$linkinv(as.vector(pred)))
  }
}, "ggplot2")

Now we can make a plot with the patched ggplot2:

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point() + geom_smooth(se = F, method = "gam", formula = y ~ s(x, bs = "cs"))

Console output:

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.4280     0.0365   93.91   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F  p-value    
s(x) 1.546      9 5.947 5.64e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.536   Deviance explained = 55.1%
GCV = 0.070196  Scale est. = 0.066622  n = 50

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.77000    0.03797   72.96   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F  p-value    
s(x) 1.564      9 1.961 8.42e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.268   Deviance explained = 29.1%
GCV = 0.075969  Scale est. = 0.072074  n = 50

Family: gaussian 
Link function: identity 

Formula:
y ~ s(x, bs = "cs")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.97400    0.04102    72.5   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
       edf Ref.df     F p-value   
s(x) 1.279      9 1.229   0.001 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.191   Deviance explained = 21.2%
GCV = 0.088147  Scale est. = 0.08413   n = 50

Note: I do not recommend this approach.

2. Solving the problem by fitting models via tidyverse

I think it's better to just run your models separately. Doing so is quite easy with tidyverse and broom, so I'm not sure why you wouldn't want to do it.

library(tidyverse)
library(broom)
iris %>% nest(-Species) %>% 
  mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
         results = map(fit, glance),
         R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
  unnest(results) %>%
  select(-data, -fit)

#      Species  R.square       df    logLik      AIC      BIC deviance df.residual
# 1     setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460    47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909    47.43638
# 3  virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793    47.72143

As you can see, the extracted R squared values are exactly the same in both cases.

这篇关于在geom_smooth gam中为每条线调整r平方值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆