如何从GAM(`mgcv :: gam`)中提取拟合的样条曲线 [英] How to extract fitted splines from a GAM (`mgcv::gam`)

查看:819
本文介绍了如何从GAM(`mgcv :: gam`)中提取拟合的样条曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GAM在logistic回归中对时间趋势进行建模.但是我想从中提取拟合的样条,然后将其添加到GAM或GAMM无法拟合的另一个模型中.

I am using GAM to model time trends in a logistic regression. Yet I would like to extract the the fitted spline from it to add it to another model, that cannot be fitted in GAM or GAMM.

因此,我有2个问题:

  1. 随着时间的推移,我该如何拟合平滑度,以使一个结位于特定位置,同时让模型找到其他结?

  1. How can I fit a smoother over time so that I force one knot to be at a particular location while letting the model to find the other knots?

如何从拟合的GAM中提取矩阵,以便可以将其用作其他模型的估算值?

How can I extract the matrix from the fitted GAM so that I can use it in as an impute for a different model?

我正在运行的模型类型为以下形式:

The types of models I am running are to the following form:

gam <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
           s(birth_year,by=wealth2) + wealth2 + sex +
           residence + maternal_educ + birth_order,
           data=colombia2, family="binomial")

我已经阅读了GAM的详尽文档,但仍不确定. 任何建议都非常感谢.

I've read the extensive documentation for the GAM but I am not sure still. Any suggestion is really appreciated.

推荐答案

mgcv::gam中,有一种方法可以通过predict.gam方法和type = "lpmatrix"来执行此操作(您的Q2).

In mgcv::gam there is a way to do this (your Q2), via the predict.gam method and type = "lpmatrix".

?predict.gam甚至有一个示例,我在下面复制:

?predict.gam even has an example, which I reproduce below:

 library(mgcv)
 n <- 200
 sig <- 2
 dat <- gamSim(1,n=n,scale=sig)

 b <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3), data = dat)

 newd <- data.frame(x0=(0:30)/30, x1=(0:30)/30, x2=(0:30)/30, x3=(0:30)/30)

 Xp <- predict(b, newd, type="lpmatrix")

 ##################################################################
 ## The following shows how to use use an "lpmatrix" as a lookup 
 ## table for approximate prediction. The idea is to create 
 ## approximate prediction matrix rows by appropriate linear 
 ## interpolation of an existing prediction matrix. The additivity 
 ## of a GAM makes this possible. 
 ## There is no reason to ever do this in R, but the following 
 ## code provides a useful template for predicting from a fitted 
 ## gam *outside* R: all that is needed is the coefficient vector 
 ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
 ## higher order interpolation for higher accuracy.  
 ###################################################################

 xn <- c(.341,.122,.476,.981) ## want prediction at these values
 x0 <- 1         ## intercept column
 dx <- 1/30      ## covariate spacing in `newd'
 for (j in 0:2) { ## loop through smooth terms
   cols <- 1+j*9 +1:9      ## relevant cols of Xp
   i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
   w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
   ## find approx. predict matrix row portion, by interpolation
   x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
 }
 dim(x0)<-c(1,28) 
 fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
 se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
 ## compare to normal prediction
 predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
         x2=xn[3],x3=xn[4]),se=TRUE)

整个过程甚至整个预测步骤都将在R或GAM模型之外完成.您将需要对示例进行一些修改以执行所需的操作,因为该示例评估了模型中的所有项,并且除了样条线外还有其他两个项–本质上,您执行相同的操作,但仅针对样条线项,涉及找到样条线的Xp矩阵的相关列和行.然后,您还应该注意,样条线居中,因此您可能会也可能不想撤消该操作.

That goes through the entire process even the prediction step which would be done outside R or of the GAM model. You are going to have to modify the example a bit to do what you want as the example evaluates all terms in the model and you have two other terms besides the spline - essentially you do the same thing, but only for the spline terms, which involves finding the relevant columns and rows of the Xp matrix for the spline. Then also you should note that the spline is centred so you may or may not want to undo that too.

对于Q1,在示例中为xn向量/矩阵选择适当的值.这些对应于模型中第c6项的值.因此,将要固定的值设置为某个平均值,然后更改与样条曲线关联的值.

For your Q1, choose appropriate values for the xn vector/matrix in the example. These correspond to values for the nth term in the model. So set the ones you want fixed to some mean value and then vary the one associated with the spline.

如果要在R中完成所有的操作,则仅使用样条数据的样条协变量的值来评估样条就容易得多,该样条协变量的数据将进入另一个模型.为此,您需要创建一个数值框架来进行预测,然后使用

If you are doing all of this in R, it would be easier to just evaluate the spline at the values of the spline covariate that you have data for that is going into the other model. You do that by creating a data frame of values at which to predict at, then use

predict(mod, newdata = newdat, type = "terms")

其中,mod是拟合的GAM模型(通过mgcv::gam),newdat是数据帧,其中包含模型中每个变量的列(包括参数项;设置您不希望使用的项)变化到某个恒定的平均值(例如,数据集中变量的平均值)或某个水平(如果有因素的话). type = "terms"部分将为newdat中的每一行返回一个矩阵,并对模型中每个项(包括样条项)的拟合值贡献".只需取该矩阵对应于样条线的列-再次将其居中即可.

where mod is the fitted GAM model (via mgcv::gam), newdat is the data frame containing a column for each variable in the model (including the parametric terms; set the terms you don't want to vary to some constant mean value [say the average of the variable in the data set] or certain level if a factor). The type = "terms" part will return a matrix for each row in newdat with the "contribution" to the fitted value for each term in the model, including the spline term. Just take the column of this matrix that corresponds to the spline - again it is centered.

也许我误解了您的Q1.如果要控制结,请参见mgcv::gamknots参数.默认情况下,mgcv::gam在数据的极端处打一个结,然后剩余的结"在该间隔内平均分布. mgcv::gam不会找到结-它会为您放置结,您可以通过knots参数控制放置结的位置.

Perhaps I misunderstood your Q1. If you want to control the knots, see the knots argument to mgcv::gam. By default, mgcv::gam places a knot at the extremes of the data and then the remaining "knots" are spread evenly over the interval. mgcv::gam doesn't find the knots - it places them for you and you can control where it places them via the knots argument.

这篇关于如何从GAM(`mgcv :: gam`)中提取拟合的样条曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆