在geom_smooth gam中为每条线调整r平方值 [英] Getting adjusted r-squared value for each line in a geom_smooth gam
问题描述
我使用ggplot2制作了下图。
PlotEchi = ggplot(data = Echinoidea,
aes(x =年份,y =平均值,group = aspect, linetype = aspect,shape = aspect))+
geom_errorbar(aes(ymin = mean-se,ymax = mean + se),width = .025,position = pd)+
geom_point(position = pd ,size = 2)+
geom_smooth(method =gam,公式= y〜s(x,k = 3),se = F,size = 0.5,color =black)+
xlab()+
ylab(丰度(平均值+/- SE))+
facet_wrap(〜species,scales =free,ncol = 1)+
scale_y_continuous限制= c(min(y = 0),max(echinoidea $ mean + echinoidea $ se)))+
scale_x_continuous(限制= c(min(Echinoidea $年-0.125),max(Echinoidea $年+ 0.125 )))b
$ b .jpgrel =nofollow noreferrer>
我想要做的是轻松地检索每条拟合线的调整R平方,而不需要执行 mgcv :: gam
为每个绘制的线使用 model< -gam(df,formula = y〜s(x1)....)
。有任何想法吗?
解决方案这不是真的可行,因为ggplot2会抛出拟合对象。你可以在这里看到
1。通过修补ggplot2解决问题
一个丑陋的解决方法是即时修补ggplot2代码以打印出结果。你可以这样做,如下所示。最初的任务会抛出一个错误,但事情无论如何工作。要取消此操作,请重新启动R会话。
library(ggplot2)
#assignInNamespace patches`predictdf .glm`,并添加
#这一行打印模型的摘要。出于某种原因,这
#会产生一个错误,但事情仍然有效。
assignInNamespace(predictdf.glm,function(model,xseq,se,level){
pred < - stats :: predict(model,newdata = data.frame(x = xseq),se如果(se){.fit = se,
type =link)
print(summary(model))#这是我添加
的行
std < - stats :: qnorm(level / 2 + 0.5)
data.frame(
x = xseq,
y = model $ family $ linkinv(as.vector(pred $ fit)),
ymin = model $ family $ linkinv(as.vector(pred $ fit - std * pred $ se.fit)),
ymax = model $ family $ linkinv(as.vector (pred $ fit + std * pred $ se.fit)),
se = as.vector(pred $ se.fit)
)
} else {
data.frame (x = xseq,y = model $ family $ linkinv(as.vector(pred)))
}
},ggplot2)
现在我们可以使用修补过的ggplot2绘制一个图:
ggplot(iris,aes(Sepal.Length,Sepal.Width,color = Species))+
geom_point()+ geom_smooth(se = F,method =gam,公式= y〜s(x,bs =cs))
控制台输出:
高斯
链接函数:标识
公式:
y〜s(x,bs =cs)
参数系数:
估计标准。错误t值Pr(> | t |)
(截距)3.4280 0.0365 93.91 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.546 9 5.947 5.64e-11 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
R-sq。(adj)= 0.536偏差解释= 55.1%
GCV = 0.070196 Scale est。= 0.066622 n = 50
系列:高斯
链接功能:身份
公式:
y〜s(x, bs =cs)
参数系数:
估计标准。错误t值Pr(> | t |)
(截距)2.77000 0.03797 72.96 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.564 9 1.961 8.42e-05 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
R-sq。(adj)= 0.268偏差解释= 29.1%
GCV = 0.075969 Scale est。= 0.072074 n = 50
系列:高斯
链接功能:身份
公式:
y〜s(x, bs =cs)
参数系数:
估计标准。错误t值Pr(> | t |)
(拦截)2.97400 0.04102 72.5 <2e-16 ***
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
平滑项的近似重要性:
edf Ref.df F p-价值
s(x)1.279 9 1.229 0.001 **
---
Signif。代码:0'***'0.001'**'0.01'*'0.05'。'0.1''1
R-sq。(adj)= 0.191偏差解释= 21.2%
GCV = 0.088147 Scale est。= 0.08413 n = 50
注意:我不建议这种方法。
2。通过tidyverse拟合模型解决问题
我认为最好单独运行模型。这样做很容易使用潮汐和扫帚,所以我不确定你为什么不想这样做。
库(tidyverse)
库(扫帚)
虹膜%>%巢( - 物种)%>%
mutate(fit = map(data,〜mgcv :: gam(Sepal。宽度〜s(Sepal.Length,bs =cs),data =。)),
results = map(fit,glance),
R.square = map_dbl(fit,〜summary(。 )$%
unnest(results)%>%
select(-data,-fit)
#物种R.square df logLik AIC BIC偏差df.residual
#1 setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460 47.45399
#2杂色0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909 47.43638
#3维吉尼卡0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793 47.72143
正如您所看到的,提取的R平方值在两种情况下完全相同。 p>
I produced the below graph using ggplot2.
PlotEchi = ggplot(data=Echinoidea,
aes(x=Year, y=mean, group = aspect, linetype = aspect, shape=aspect)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.025, position=pd) +
geom_point(position=pd, size=2) +
geom_smooth(method = "gam", formula = y~s(x, k=3), se=F, size = 0.5,colour="black") +
xlab("") +
ylab("Abundance (mean +/- SE)") +
facet_wrap(~ species, scales = "free", ncol=1) +
scale_y_continuous(limits=c(min(y=0), max(Echinoidea$mean+Echinoidea$se))) +
scale_x_continuous(limits=c(min(Echinoidea$Year-0.125), max(Echinoidea$Year+0.125)))
What I would like to do is easily retrieve the adjusted R-square for each of the fitted lines without doing an individual mgcv::gam
for each plotted line using model<-gam(df, formula = y~s(x1)....)
. Any ideas?
解决方案 This is not really possible, because ggplot2 throws away the fitted object. You can see this in the source here.
1. Solving the problem by patching ggplot2
One ugly workaround is to patch the ggplot2 code on the fly to print out the results. You can do this as follows. The initial assignment throws an error but things work anyways. To undo this just restart your R session.
library(ggplot2)
# assignInNamespace patches `predictdf.glm` from ggplot2 and adds
# a line that prints the summary of the model. For some reason, this
# creates an error, but things work nonetheless.
assignInNamespace("predictdf.glm", function(model, xseq, se, level) {
pred <- stats::predict(model, newdata = data.frame(x = xseq), se.fit = se,
type = "link")
print(summary(model)) # this is the line I added
if (se) {
std <- stats::qnorm(level / 2 + 0.5)
data.frame(
x = xseq,
y = model$family$linkinv(as.vector(pred$fit)),
ymin = model$family$linkinv(as.vector(pred$fit - std * pred$se.fit)),
ymax = model$family$linkinv(as.vector(pred$fit + std * pred$se.fit)),
se = as.vector(pred$se.fit)
)
} else {
data.frame(x = xseq, y = model$family$linkinv(as.vector(pred)))
}
}, "ggplot2")
Now we can make a plot with the patched ggplot2:
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() + geom_smooth(se = F, method = "gam", formula = y ~ s(x, bs = "cs"))
Console output:
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4280 0.0365 93.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.546 9 5.947 5.64e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.536 Deviance explained = 55.1%
GCV = 0.070196 Scale est. = 0.066622 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.77000 0.03797 72.96 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.564 9 1.961 8.42e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.268 Deviance explained = 29.1%
GCV = 0.075969 Scale est. = 0.072074 n = 50
Family: gaussian
Link function: identity
Formula:
y ~ s(x, bs = "cs")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.97400 0.04102 72.5 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x) 1.279 9 1.229 0.001 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.191 Deviance explained = 21.2%
GCV = 0.088147 Scale est. = 0.08413 n = 50
Note: I do not recommend this approach.
2. Solving the problem by fitting models via tidyverse
I think it's better to just run your models separately. Doing so is quite easy with tidyverse and broom, so I'm not sure why you wouldn't want to do it.
library(tidyverse)
library(broom)
iris %>% nest(-Species) %>%
mutate(fit = map(data, ~mgcv::gam(Sepal.Width ~ s(Sepal.Length, bs = "cs"), data = .)),
results = map(fit, glance),
R.square = map_dbl(fit, ~ summary(.)$r.sq)) %>%
unnest(results) %>%
select(-data, -fit)
# Species R.square df logLik AIC BIC deviance df.residual
# 1 setosa 0.5363514 2.546009 -1.922197 10.93641 17.71646 3.161460 47.45399
# 2 versicolor 0.2680611 2.563623 -3.879391 14.88603 21.69976 3.418909 47.43638
# 3 virginica 0.1910916 2.278569 -7.895997 22.34913 28.61783 4.014793 47.72143
As you can see, the extracted R squared values are exactly the same in both cases.
这篇关于在geom_smooth gam中为每条线调整r平方值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!