mgcv:如何在自适应平滑中提取P样条的结,基,系数和预测? [英] mgcv: how to extract knots, basis, coefficients and predictions for P-splines in adaptive smooth?

查看:198
本文介绍了mgcv:如何在自适应平滑中提取P样条的结,基,系数和预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中使用mgcv软件包通过以下方式将一些多项式样条拟合为某些数据:

I'm using the mgcv package in R to fit some polynomial splines to some data via:

x.gam <- gam(cts ~ s(time, bs = "ad"), data = x.dd,
             family = poisson(link = "log"))

我正在尝试提取拟合的功能形式. x.gamgamObject,我一直在阅读文档,但没有找到足够的信息来手动重建拟合函数.

I'm trying to extract the functional form of the fit. x.gam is a gamObject, and I've been reading the documentation but haven't found enough information in order to manually reconstruct the fitted function.

  • x.gam$smooth包含有关是否已打结的信息;
  • x.gam$coefficients给出了样条系数,但是我不知道使用了什么阶次的多项式样条,并且在代码中查看并没有发现任何东西.
  • x.gam$smooth contains information about whether the knots have been placed;
  • x.gam$coefficients gives the spline coefficients, but I don't know what order polynomial splines are used and looking in the code has not revealed anything.

是否有一种整齐的方法来提取所使用的结,系数和基础,以便人们可以手动重建拟合?

推荐答案

我没有您的数据,因此我以?adaptive.smooth中的以下示例为您展示可以在哪里找到所需信息.请注意,尽管此示例是针对高斯数据而不是泊松数据,但只有链接函数有所不同.其余的只是标准的.

I don't have your data, so I take the following example from ?adaptive.smooth to show you where you can find information you want. Note that though this example is for Gaussian data rather than Poisson data, only the link function is different; all the rest are just standard.

 x <- 1:1000/1000  # data between [0, 1]
 mu <- exp(-400*(x-.6)^2)+5*exp(-500*(x-.75)^2)/3+2*exp(-500*(x-.9)^2)
 y <- mu+0.5*rnorm(1000)
 b <- gam(y~s(x,bs="ad",k=40,m=5))

现在,所有有关平滑构造的信息都存储在b$smooth中,我们将其删除:

Now, all information on smooth construction is stored in b$smooth, we take it out:

smooth <- b$smooth[[1]]  ## extract smooth object for first smooth term

结:

smooth$knots为您提供结的位置.

> smooth$knots
 [1] -0.081161 -0.054107 -0.027053  0.000001  0.027055  0.054109  0.081163
 [8]  0.108217  0.135271  0.162325  0.189379  0.216433  0.243487  0.270541
[15]  0.297595  0.324649  0.351703  0.378757  0.405811  0.432865  0.459919
[22]  0.486973  0.514027  0.541081  0.568135  0.595189  0.622243  0.649297
[29]  0.676351  0.703405  0.730459  0.757513  0.784567  0.811621  0.838675
[36]  0.865729  0.892783  0.919837  0.946891  0.973945  1.000999  1.028053
[43]  1.055107  1.082161

请注意,在[0, 1]的每一侧都放置了三个外部结,以构建样条线基础.

Note, three external knots are placed beyond each side of [0, 1] to construct spline basis.

基础班

attr(smooth, "class")告诉您样条曲线的类型.正如从?adaptive.smooth可以看到的,对于bs = admgcv使用P样条曲线,因此会得到"pspline.smooth".

attr(smooth, "class") tells you the type of spline. As you can read from ?adaptive.smooth, for bs = ad, mgcv use P-splines, hence you get "pspline.smooth".

mgcv使用二阶pspline,可以通过检查差异矩阵smooth$D来验证这一点.下面是快照:

mgcv use 2nd order pspline, you can verify this by checking the difference matrix smooth$D. Below is a snapshot:

> smooth$D[1:6,1:6]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1   -2    1    0    0    0
[2,]    0    1   -2    1    0    0
[3,]    0    0    1   -2    1    0
[4,]    0    0    0    1   -2    1
[5,]    0    0    0    0    1   -2
[6,]    0    0    0    0    0    1

系数

您已经知道b$coefficients包含模型系数:

You have already known that b$coefficients contain model coefficients:

beta <- b$coefficients

请注意,这是一个命名向量:

Note this is a named vector:

> beta
(Intercept)      s(x).1      s(x).2      s(x).3      s(x).4      s(x).5 
 0.37792619 -0.33500685 -0.30943814 -0.30908847 -0.31141148 -0.31373448 
     s(x).6      s(x).7      s(x).8      s(x).9     s(x).10     s(x).11 
-0.31605749 -0.31838050 -0.32070350 -0.32302651 -0.32534952 -0.32767252 
    s(x).12     s(x).13     s(x).14     s(x).15     s(x).16     s(x).17 
-0.32999553 -0.33231853 -0.33464154 -0.33696455 -0.33928755 -0.34161055 
    s(x).18     s(x).19     s(x).20     s(x).21     s(x).22     s(x).23 
-0.34393354 -0.34625650 -0.34857906 -0.05057041  0.48319491  0.77251118 
    s(x).24     s(x).25     s(x).26     s(x).27     s(x).28     s(x).29 
 0.49825345  0.09540020 -0.18950763  0.16117012  1.10141701  1.31089436 
    s(x).30     s(x).31     s(x).32     s(x).33     s(x).34     s(x).35 
 0.62742937 -0.23435309 -0.19127140  0.79615752  1.85600016  1.55794576 
    s(x).36     s(x).37     s(x).38     s(x).39 
 0.40890236 -0.20731309 -0.47246357 -0.44855437

基本矩阵/模型矩阵/线性预测矩阵(lpmatrix)

您可以从以下位置获取模型矩阵:

You can get model matrix from:

mat <- predict.gam(b, type = "lpmatrix")

这是一个n-by-p矩阵,其中n是观察数,而p是系数数.此矩阵的列名称为:

This is an n-by-p matrix, where n is the number of observations, and p is the number of coefficients. This matrix has column name:

> head(mat[,1:5])
  (Intercept)    s(x).1    s(x).2      s(x).3      s(x).4
1           1 0.6465774 0.1490613 -0.03843899 -0.03844738
2           1 0.6437580 0.1715691 -0.03612433 -0.03619157
3           1 0.6384074 0.1949416 -0.03391686 -0.03414389
4           1 0.6306815 0.2190356 -0.03175713 -0.03229541
5           1 0.6207361 0.2437083 -0.02958570 -0.03063719
6           1 0.6087272 0.2688168 -0.02734314 -0.02916029

第一列全为1,给出截距.而s(x).1建议s(x)的第一基函数.如果要查看各个基函数的外观,可以针对变量绘制mat列.例如:

The first column is all 1, giving intercept. While s(x).1 suggests the first basis function for s(x). If you want to view what individual basis function look like, you can plot a column of mat against your variable. For example:

plot(x, mat[, "s(x).20"], type = "l", main = "20th basis")

线性预测器

如果要手动构建拟合,可以执行以下操作:

If you want to manually construct the fit, you can do:

pred.linear <- mat %*% beta

请注意,这正是从b$linear.predictors

predict.gam(b, type = "link")

响应/拟合值

对于非高斯数据,如果要获取响应变量,可以将逆链接函数应用于线性预测变量以映射回原始比例.

For non-Gaussian data, if you want to get response variable, you can apply inverse link function to linear predictor to map back to original scale.

家庭信息存储在gamObject$family中,而gamObject$family$linkinv是反向链接功能.上面的示例可以肯定地为您提供身份链接,但是对于您安装的对象x.gam,您可以执行以下操作:

Family information are stored in gamObject$family, and gamObject$family$linkinv is the inverse link function. The above example will certain gives you identity link, but for your fitted object x.gam, you can do:

x.gam$family$linkinv(x.gam$linear.predictors)

请注意,这与x.gam$fitted

predict.gam(x.gam, type = "response").


其他链接

我刚刚意识到以前有很多类似的问题.

I have just realized that there were quite a lot of similar questions before.

  1. 加文·辛普森(Gavin Simpson)的答案对于predict.gam( , type = 'lpmatrix')来说是很好的.
  2. 此答案是关于predict.gam(, type = 'terms')的.
  1. This answer by Gavin Simpson is great, for predict.gam( , type = 'lpmatrix').
  2. This answer is about predict.gam(, type = 'terms').

但是无论如何,最好的参考始终是?predict.gam,其中包括大量示例.

But anyway, the best reference is always ?predict.gam, which includes extensive examples.

这篇关于mgcv:如何在自适应平滑中提取P样条的结,基,系数和预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆