mgcv_1.8-24:“fREML"或“REML"bam() 的方法给出了错误的解释偏差 [英] mgcv_1.8-24: "fREML" or "REML" method of bam() gives wrong explained deviance

查看:58
本文介绍了mgcv_1.8-24:“fREML"或“REML"bam() 的方法给出了错误的解释偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用fREML"和REML"方法将相同的模型与 bam 拟合给了我接近的结果,但是解释的偏差与 summary.gam 返回的不同.

Fitting the same model with bam using methods "fREML" and "REML" gave me close results, but the deviance explained is rather different as returned by summary.gam.

fREML"的数量是~3.5%(不好),而REML"的数量是~50%(不是那么糟糕).怎么可能?哪一个是正确的?

With "fREML" the quantity is ~3.5% (not good) while with "REML" it is ~50% (not that bad). How can it be possible? Which one is correct?

很遗憾,我无法提供一个简单的可重现示例.

Unfortunately, I cannot provide a simple reproducible example.

#######################################
## method = "fREML", discrete = TRUE ##
#######################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -5.0026     0.2199  -22.75   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.00  1.001  17.54 2.82e-05 
s(RandomVar)     16.39 19.000 145.03  < 2e-16  
R-sq.(adj) =  0.00349   Deviance explained = 3.57%
fREML = 2.8927e+05  Scale est. = 1         n = 312515

<小时>

########################################
## method = "fREML", discrete = FALSE ##
########################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -4.8941     0.2207  -22.18   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.008  1.016  17.44 3.09e-05 
s(RandomVar)     16.390 19.000 144.86  < 2e-16  
R-sq.(adj) =  0.00349   Deviance explained = 3.57%
fREML = 3.1556e+05  Scale est. = 1         n = 312515

<小时>

#####################################################
## method = "REML", discrete method not applicable ##
#####################################################

Family: binomial 
Link function: logit 
Formula:
ObsOrRand ~ s(Var1, k = 3) + s(RandomVar, bs = "re")  
Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -4.8928     0.2205  -22.19   <2e-16  
Approximate significance of smooth terms:
                  edf Ref.df Chi.sq  p-value 
s(Var1)          1.156  1.278  16.57 8.53e-05 
s(RandomVar)     16.379 19.000 142.60  < 2e-16  
R-sq.(adj) =  0.0035   Deviance explained = 50.8%
-REML = 3.1555e+05  Scale est. = 1         n = 312515

推荐答案

这个问题可以回溯到mgcv_1.8-23.它的 changlog 用于读取:

This issue can be back tracked to mgcv_1.8-23. Its changlog for reads:

* bam extended family extension had introduced a bug in null deviance 
  computation for Gaussian additive case when using methods other than fREML 
  or GCV.Cp. Fixed.

现在证明补丁对高斯情况成功,但对非高斯情况不成功.

It now turns out that the patch is successful for Gaussian case, but not for non-Gaussian cases.

让我首先提供一个可重现的示例,因为您的问题没有.

Let me first provide a reproducible example, as your question does not have one.

set.seed(0)
x <- runif(1000)
## the linear predictor is a 3rd degree polynomial
p <- binomial()$linkinv(0.5 + poly(x, 3) %*% rnorm(3) * 20)
## p is well spread out on (0, 1); check `hist(p)`
y <- rbinom(1000, 1, p)

library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.

fREML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "fREML")
REML <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "REML")
GCV <- bam(y ~ s(x, bs = 'cr', k = 8), family = binomial(), method = "GCV.Cp")

## explained.deviance = (null.deviance - deviance) / null.deviance
## so in this example we get negative explained deviance for "REML" method

unlist(REML[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     181.7107     1107.5241 

unlist(fREML[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     1357.936      1107.524 

unlist(GCV[c("null.deviance", "deviance")])
#null.deviance      deviance 
#     1357.936      1108.108 

Null deviance 不能小于 deviance(TSS 不能小于 RSS),所以 bam 的REML"方法在这里无法返回正确的 Null 偏差.

Null deviance can't be smaller than deviance (TSS can not be smaller than RSS), so "REML" method of bam fails to return the correct Null deviance here.

我在mgcv_1.8-24/R/bam.r的第1350行找到了问题:

I have located the problem at line 1350 of mgcv_1.8-24/R/bam.r:

object$family <- object$fitted.values <- NULL

其实应该是

object$null.deviance <- object$fitted.values <- NULL

对于除GCV.Cp"和fREML"之外的方法,bam依赖于gam进行估计,减少了大的nxp将矩阵模型化为 pxp 矩阵(n:数据数量;p:系数数量).由于这个新的模型矩阵没有自然的解释,gam 返回的许多数量应该是无效的(除了估计的平滑参数).西蒙把family写错了.

For methods other than "GCV.Cp" and "fREML", bam relies on gam for estimation, after reducing the large n x p model matrix to a p x p matrix (n: number of data; p: number of coefficients). Since this new model matrix does not have a natural interpretation, many quantities returned by gam should be invalidated (apart from estimated smooth parameters). It was a typo for Simon to put family.

我构建了一个补丁版本,结果证明修复了错误.我会告诉 Simon 在下一个版本中修复它.

I build a patched version and it turns out to fix the bug. I will tell Simon to get it fixed in the next release.

这篇关于mgcv_1.8-24:“fREML"或“REML"bam() 的方法给出了错误的解释偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆