mgcv_1.8-24:“fREML"或“REML"bam() 的方法给出了错误的解释偏差 [英] mgcv_1.8-24: "fREML" or "REML" method of bam() gives wrong explained deviance
问题描述
使用fREML"和REML"方法将相同的模型与 bam
拟合给了我接近的结果,但是解释的偏差与 summary.gam
返回的不同.
Fitting the same model with bam
using methods "fREML" and "REML" gave me close results, but the deviance explained is rather different as returned by summary.gam
.
fREML"的数量是~3.5%(不好),而REML"的数量是~50%(不是那么糟糕).怎么可能?哪一个是正确的?
With "fREML" the quantity is ~3.5% (not good) while with "REML" it is ~50% (not that bad). How can it be possible? Which one is correct?
很遗憾,我无法提供一个简单的可重现示例.
Unfortunately, I cannot provide a simple reproducible example.
#######################################
## method = "fREML", discrete = TRUE ##
#######################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.0026 0.2199 -22.75 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.00 1.001 17.54 2.82e-05
s(RandomVar) 16.39 19.000 145.03 < 2e-16
R-sq.(adj) = 0.00349 Deviance explained = 3.57%
fREML = 2.8927e+05 Scale est. = 1 n = 312515
<小时>
########################################
## method = "fREML", discrete = FALSE ##
########################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.8941 0.2207 -22.18 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.008 1.016 17.44 3.09e-05
s(RandomVar) 16.390 19.000 144.86 < 2e-16
R-sq.(adj) = 0.00349 Deviance explained = 3.57%
fREML = 3.1556e+05 Scale est. = 1 n = 312515
<小时>
#####################################################
## method = "REML", discrete method not applicable ##
#####################################################
Family: binomial
Link function: logit
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.8928 0.2205 -22.19 <2e-16
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Var1) 1.156 1.278 16.57 8.53e-05
s(RandomVar) 16.379 19.000 142.60 < 2e-16
R-sq.(adj) = 0.0035 Deviance explained = 50.8%
-REML = 3.1555e+05 Scale est. = 1 n = 312515
推荐答案
这个问题可以回溯到mgcv_1.8-23
.它的 changlog 用于读取:
This issue can be back tracked to mgcv_1.8-23
. Its changlog for reads:
* bam extended family extension had introduced a bug in null deviance
computation for Gaussian additive case when using methods other than fREML
or GCV.Cp. Fixed.
现在证明补丁对高斯情况成功,但对非高斯情况不成功.
It now turns out that the patch is successful for Gaussian case, but not for non-Gaussian cases.
让我首先提供一个可重现的示例,因为您的问题没有.
Let me first provide a reproducible example, as your question does not have one.
set.seed(0)
x <- runif(1000)
## the linear predictor is a 3rd degree polynomial
p <- binomial()$linkinv(0.5 + poly(x, 3) %*% rnorm(3) * 20)
## p is well spread out on (0, 1); check `hist(p)`
y <- rbinom(1000, 1, p)
library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.
fREML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "fREML")
REML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "REML")
GCV <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "GCV.Cp")
## explained.deviance = (null.deviance - deviance) / null.deviance
## so in this example we get negative explained deviance for "REML" method
unlist(REML[c("null.deviance", "deviance")])
#null.deviance deviance
# 181.7107 1107.5241
unlist(fREML[c("null.deviance", "deviance")])
#null.deviance deviance
# 1357.936 1107.524
unlist(GCV[c("null.deviance", "deviance")])
#null.deviance deviance
# 1357.936 1108.108
Null deviance 不能小于 deviance(TSS 不能小于 RSS),所以 bam
的REML"方法在这里无法返回正确的 Null 偏差.
Null deviance can't be smaller than deviance (TSS can not be smaller than RSS), so "REML" method of bam
fails to return the correct Null deviance here.
我在mgcv_1.8-24/R/bam.r
的第1350行找到了问题:
I have located the problem at line 1350 of mgcv_1.8-24/R/bam.r
:
object$family <- object$fitted.values <- NULL
其实应该是
object$null.deviance <- object$fitted.values <- NULL
对于除GCV.Cp"和fREML"之外的方法,bam
依赖于gam
进行估计,减少了大的nxp
将矩阵模型化为 pxp
矩阵(n
:数据数量;p
:系数数量).由于这个新的模型矩阵没有自然的解释,gam
返回的许多数量应该是无效的(除了估计的平滑参数).西蒙把family
写错了.
For methods other than "GCV.Cp" and "fREML", bam
relies on gam
for estimation, after reducing the large n x p
model matrix to a p x p
matrix (n
: number of data; p
: number of coefficients). Since this new model matrix does not have a natural interpretation, many quantities returned by gam
should be invalidated (apart from estimated smooth parameters). It was a typo for Simon to put family
.
我构建了一个补丁版本,结果证明修复了错误.我会告诉 Simon 在下一个版本中修复它.
I build a patched version and it turns out to fix the bug. I will tell Simon to get it fixed in the next release.
这篇关于mgcv_1.8-24:“fREML"或“REML"bam() 的方法给出了错误的解释偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!