自定义链接功能适用于GLM,但不适用于mgcv GAM [英] Custom Link function works for GLM but not mgcv GAM

查看:176
本文介绍了自定义链接功能适用于GLM,但不适用于mgcv GAM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,答案很明显,但是我花了很多时间尝试在mgcv.gam中使用自定义链接功能

Apologies if the answer is obvious but I've spent quite some time trying to use a custom link function in mgcv.gam

简而言之,

  • 我想使用来自软件包 psyphy (我想使用 psyphy.probit_2asym ,我称为custom_link)
  • 我可以使用此链接创建一个{stats} family对象,并将其用于glm的"family"参数中.

  • I want to use a modified probit link from package psyphy ( I want to use psyphy.probit_2asym, I call it custom_link )
  • I can create a {stats}family object with this link and use it in the 'family' argument of glm.

m <- glm(y~x, family=binomial(link=custom_link), ... )

用作{mgcv} gam

It does not work when used as an argument for {mgcv}gam

m <- gam(y~s(x), family=binomial(link=custom_link), ... )

我收到错误Error in fix.family.link.family(family) : link not recognised

如果指定标准的link=probit,我不会得到此错误的原因,无论是glm还是gam都能正常工作.

I do not get the reason for this error, both glm and gam work if I specify the standard link=probit.

所以我的问题可以概括为:

So my question can be summarized as:

此自定义链接中缺少哪些适用于glm但不适用于gam的东西?

如果您能给我提示我应该做什么,请提前感谢.

Thanks in advance if you can give me a hint on what I should do.

链接功能

probit.2asym <- function(g, lam) {
    if ((g < 0 ) || (g > 1))
        stop("g must in (0, 1)")
    if ((lam < 0) || (lam > 1))
        stop("lam outside (0, 1)")
    linkfun <- function(mu) {
        mu <- pmin(mu, 1 - (lam + .Machine$double.eps))
        mu <- pmax(mu, g + .Machine$double.eps)
        qnorm((mu - g)/(1 - g - lam))
        }
    linkinv <- function(eta) {
        g + (1 - g - lam) * 
         pnorm(eta)
        }
    mu.eta <- function(eta) {
        (1 - g - lam) * dnorm(eta)      }
    valideta <- function(eta) TRUE
    link <- paste("probit.2asym(", g, ", ", lam, ")", sep = "")
    structure(list(linkfun = linkfun, linkinv = linkinv, 
    mu.eta = mu.eta, valideta = valideta, name = link), 
    class = "link-glm")
}

推荐答案

如您所知,glm采用迭代加权最小二乘法拟合迭代. gam的早期版本通过拟合经惩罚的加权加权最小二乘法对此进行了扩展,该功能由gam.fit函数完成.在某些情况下,这称为性能迭代.

As you may know, glm takes iteratively reweighted least squares fitting iterations. Early version of gam extends this by fitting an iteratively penalized reweighted least squares, which is done by the gam.fit function. This is known as performance iteration in some context.

自2008年以来(甚至可能更早),基于外部迭代gam.fit3已将gam.fit替换为gam的默认设置.此类更改确实需要该家族的一些其他信息,您可以从中获得有关?fix.family.link的信息.

Since 2008 (or maybe slightly even earlier), gam.fit3 based on what is called outer iteration has replaced gam.fit as gam default. Such change does require some extra information of the family, regarding which you can read about ?fix.family.link.

两次迭代之间的主要区别是系数beta的迭代和平滑参数lambda的迭代是否嵌套.

The major difference between two iterations is whether iteration of coefficients beta and iteration of smoothing parameters lambda are nested.

  • 性能迭代采用嵌套方式,其中对于beta的每次更新,都会执行lambda的单个迭代;
  • 外部迭代将这2个迭代完全分开,对于beta的每次更新,lambda的迭代一直进行到结束,直到收敛为止.
  • Performance iteration takes the nested way, where for each update of beta, a single iteration of lambda is performed;
  • Outer iteration completely separate those 2 iterations, where for each update of beta, iteration of lambda is carried to the end till convergence.

显然,外部迭代更稳定,并且收敛失败的可能性较小.

Obviously outer iteration is more stable and less likely to suffer from failure of convergence.

gam具有参数optimizer.默认情况下,它采用optimizer = c("outer", "newton"),这是外部迭代的牛顿方法.但是如果设置optimizer = "perf",将需要性能迭代.

gam has an argument optimizer. By default it takes optimizer = c("outer", "newton"), that is the newton method of outer iteration; but if you set optimizer = "perf", it will take performance iteration.

因此,在完成以上概述之后,我们有两个选择:

So, after the above overview, we have two options:

  • 仍然使用外部迭代,但扩展了自定义链接功能;
  • 使用性能迭代来与glm保持一致.
  • still use outer iteration, but expand your customized link function;
  • use performance iteration to stay in line with glm.

我很懒,因此将演示第二个(实际上,我对采取第一个方法并不感到太自信).

I am being lazy so will demonstrate the second one (actually I am not feeling too confident to take the first approach).

可复制示例

您没有提供可复制的示例,因此我准备如下示例.

You did not provide a reproducible example, so I prepare one as below.

set.seed(0)
x <- sort(runif(500, 0, 1))    ## covariates (sorted to make plotting easier)
eta <- -4 + 3 * x * exp(x) - 2 * log(x) * sqrt(x)   ## true linear predictor
p <- binomial(link = "logit")$linkinv(eta)    ## true probability (response)
y <- rbinom(500, 1, p)    ## binary observations

table(y)    ## a quick check that data are not skewed
#  0   1 
#271 229 

我将使用您打算使用的功能probit.2asymg = 0.1lam = 0.1:

I will take g = 0.1 and lam = 0.1 of the function probit.2asym you intend to use:

probit2 <- probit.2asym(0.1, 0.1)

par(mfrow = c(1,3))

## fit a glm with logit link
glm_logit <- glm(y ~ x, family = binomial(link = "logit"))
plot(x, eta, type = "l", main = "glm with logit link")
lines(x, glm_logit$linear.predictors, col = 2)

## glm with probit.2asym
glm_probit2 <- glm(y ~ x, family = binomial(link = probit2))
plot(x, eta, type = "l", main = "glm with probit2")
lines(x, glm_probit2$linear.predictors, col = 2)

## gam with probit.2aysm
library(mgcv)
gam_probit2 <- gam(y ~ s(x, bs = 'cr', k = 3), family = binomial(link = probit2),
                   optimizer = "perf")
plot(x, eta, type = "l", main = "gam with probit2")
lines(x, gam_probit2$linear.predictors, col = 2)

我对s(x)使用了自然三次样条曲线基cr,因为对于单变量平滑,不需要使用薄板样条线的默认设置.我还设置了较小的基本尺寸k = 3(对于三次样条曲线不能较小),因为我的玩具数据接近线性,并且不需要较大的基本尺寸.更重要的是,这似乎可以防止玩具数据集性能迭代收敛失败.

I have used natural cubic spline basis cr for s(x), as for univariate smooth the default setting with thin-plate spline is unnecessary. I have also set a small basis dimension k = 3 (can't be smaller for a cubic spline) as my toy data is near linear and big basis dimension is not needed. More importantly, this seems to prevent convergence failure of performance iteration for my toy dataset.

这篇关于自定义链接功能适用于GLM,但不适用于mgcv GAM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆