具有"gp"的GAM更平滑:如何检索方差图参数? [英] GAM with "gp" smoother: how to retrieve the variogram parameters?

查看:120
本文介绍了具有"gp"的GAM更平滑:如何检索方差图参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下地理叠加模型

I am using the following geoadditive model

library(gamair)
library(mgcv)

data(mack)    
mack$log.net.area <- log(mack$net.area)

gm2 <- gam(egg.count ~ s(lon,lat,bs="gp",k=100,m=c(2,10,1)) +
                       s(I(b.depth^.5)) +
                       s(c.dist) +
                       s(temp.20m) +
                       offset(log.net.area),
                       data = mack, family = tw, method = "REML")

在这里,我正在使用范围为10且功效为1(m=c(2,10,1))的指数协方差函数.如何从结果中检索变异函数参数(金块,窗台)?我在模型输出中找不到任何内容.

Here I am using an exponential covariance function with range = 10 and power = 1 (m=c(2,10,1)). How can I retrieve from the results the variogram parameters (nugget, sill)? I couldn't find anything in the model output.

推荐答案

在平滑方法中,由于指定了相关矩阵,因此您只需估算方差参数,即基石.例如,您已将m = c(2, 10, 1)设置为s(, bs = 'gp'),从而给出了范围参数为phi = 10的指数相关矩阵.请注意,除球面相关性以外,phi与范围不同.对于许多相关模型,实际范围是phi的函数.

In smoothing approach the correlation matrix is specified so you only estimate variance parameter, i.e., the sill. For example, you've set m = c(2, 10, 1) to s(, bs = 'gp'), giving an exponential correlation matrix with range parameter phi = 10. Note that phi is not identical to range, except for spherical correlation. For many correlation models the actual range is a function of phi.

方差/底线参数与惩罚回归中的平滑参数密切相关,您可以通过将scale参数除以平滑参数来获得它:

The variance / sill parameter is closely related to the smoothing parameter in penalized regression, and you can obtain it by dividing the scale parameter by smoothing parameter:

with(gm2, scale / sp["s(lon,lat)"])
#s(lon,lat) 
#  26.20877 

这是对的吗?否.这里有一个陷阱: $sp中返回的平滑参数不是真实参数,我们需要以下几点:

Is this right? No. There is a trap here: smoothing parameters returned in $sp are not real ones, and we need the following:

gm2_sill <- with(gm2, scale / sp["s(lon,lat)"] * smooth[[1]]$S.scale) 
#s(lon,lat) 
#    7.7772 

然后复制您指定的range参数:

And we copy in the range parameter you've specified:

gm2_phi <- 10

由于平滑函数是连续的,所以掘金必须为零.使用geoR包中的lines.variomodel函数,可以可视化由s(lon,lat)建模的潜在高斯空间随机场的半变异函数.

The nugget must be zero, since a smooth function is continuous. Using lines.variomodel function from geoR package, you can visualize the semivariogram for the latent Gaussian spatial random field modeled by s(lon,lat).

library(geoR)
lines.variomodel(cov.model = "exponential", cov.pars = c(gm2_sill, gm2_phi),
                 nugget = 0, max.dist = 60)
abline(h = gm2_sill, lty = 2)

但是,请对此变异函数表示怀疑. mgcv并不是解释地统计学的简单环境.使用低秩平滑器建议上述方差参数用于新参数空间中的参数,而不是原始参数空间中的参数.例如,对于mack数据集,空间字段中存在630个唯一的空间位置,因此相关矩阵应为630 x 630,并且完全随机效应应为长度为630的向量.但是,通过在s(, bs = 'gp')中设置k = 100,截短的本征分解和随后的低秩近似将随机效应减小到length-100.对于此向量,方差参数实际上不是原始参数.这可能可以解释为什么门槛和实际范围与数据和预测的s(lon,lat)不一致.

However, be skeptical on this variogram. mgcv is not an easy environment to interpret geostatistics. The use of low-rank smoothers suggests that the above variance parameter is for parameters in the new parameter space rather than the original one. For example, there are 630 unique spatial locations in the spatial field for mack dataset, so the correlation matrix should be 630 x 630, and the full random effects should be a vector of length-630. But by setting k = 100 in s(, bs = 'gp') the truncated eigen decomposition and subsequent low-rank approximation reduce the random effects to length-100. The variance parameter is really for this vector not the original one. This might explain why the sill and the actual range do not agree with the data and predicted s(lon,lat).

## unique locations
loc <- unique(mack[, c("lon", "lat")])

max(dist(loc))
#[1] 15.98

数据集中两个空间位置之间的最大距离为15.98,但从方差图得出的实际范围似乎在40到60之间,这太大了.

The maximum distance between two spatial locations in the dataset is 15.98, but the actual range from the variogram seems to be somewhere between 40 and 60, which is too large.

## predict `s(lon, lat)`, using the method I told you in your last question
## https://stackoverflow.com/q/51634953/4891738
sp <- predict(gm2,
              data.frame(loc, b.depth = 0, c.dist = 0, temp.20m = 0,
                         log.net.area = 0),
              type = "terms", terms = "s(lon,lat)")

c(var(sp))
#[1] 1.587126

预测的s(lon,lat)仅具有1.587的方差,但在7.77处的底线要高得多.

The predicted s(lon,lat) only has variance 1.587, but the sill at 7.77 is way much higher.

这篇关于具有"gp"的GAM更平滑:如何检索方差图参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆