如何在glm中使用自定义链接功能? [英] How do I use a custom link function in glm?

查看:151
本文介绍了如何在glm中使用自定义链接功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不想使用glm中的标准日志链接进行Poisson回归,因为我有零.考虑以下代码:

I don't want to use the standard log link in glm for Poisson regression, since I have zeros. Consider the following code:

foo = 0:10
bar = 2 * foo
glm(bar ~ foo, family = poisson(link = "identity"))

我得到了错误:

错误:未找到有效的系数集:请提供起始值

Error: no valid set of coefficients has been found: please supply starting values

我不确定这是什么意思.我认为身份"链接功能是什么(即它根本不会转换数据)吗?该错误是什么意思,我该如何解决?

I'm not certain what this means. Is the "identity" link function what I think it is (i.e. it doesn't transform the data at all)? What does this error mean and how can I resolve it?

推荐答案

如果从默认(0,0)起始点以外的其他地方开始,则可以得到答案. start参数是一个矢量,其中包含响应的截距和斜率(按链接函数的标度). R报告的问题通常是,对于初始值,计算得出的(负)对数似然率变得无限大.您可以自己检查:-sum(dpois(bar,0+0*foo,log=TRUE))Inf(因为我们设置的Poisson均值为零,但得到的响应为非零).

You can get an answer if you start somewhere other than the default (0,0) starting point. The start parameter is a vector containing the intercept and slope of the response, on the scale of the link function. The problem R is reporting is typically that the calculated (negative) log-likelihood becomes infinite for the starting values. You can check this for yourself: -sum(dpois(bar,0+0*foo,log=TRUE)) is Inf (because we are setting up a Poisson with zero mean, but get a non-zero response).

但是,这并不是一个完整的解释,因为即使对于(0,2)这样的起始负对数似然可能性是有限的(-sum(dpois(bar,0+2*foo,log=TRUE))约为20)的某些起点,也会发生相同的错误-人们将不得不更深入地了解问题所在,但是我可以想象,例如,在代码中根本不允许使用泊松均值零.泊松的对数似然是(一个常数加)x*log(lambda)-lambda:即使lambdax都为零,即使这行得通,但在数学上并不总是很明显.特别是,如果查看poisson()$validmu,这是glm用于确定泊松的一组计算均值是否正确的函数,您会看到其定义为function (mu) { all(mu > 0) }. (可以修改此值以允许mu的值为零,但这会很麻烦,您需要一个充分的理由这样做-我尝试过,还有另一个问题,因为然后计算出方差为等于零.简而言之,通过自定义最大似然估计器(例如bbmle::mle2())执行此操作要比破解glm做到这一点要容易得多……)

However, this isn't a complete explanation, because even for some starting points like (0,2) where the starting negative-log-likelihood is finite (-sum(dpois(bar,0+2*foo,log=TRUE)) is about 20), the same error occurs -- one would have to dig in deeper to see what's the matter, but I can imagine for instance that a Poisson mean of zero is not allowed at all in the code. The log-likelihood of the Poisson is (a constant plus) x*log(lambda)-lambda: even though this works out OK if lambda and x are both zero, that's not always obvious in the math. In particular, if you look at poisson()$validmu, which is the function that glm uses to establish whether a set of calculated means for the Poisson is OK, you'll see that its definition is function (mu) { all(mu > 0) }. (It would be possible to modify this to allow zero values for mu, but it would be enough trouble that you'd need a good reason to do so -- I tried it, and there's another problem because variances are then calculated to equal zero. In short, it would be easier to do this through a custom maximum likelihood estimator (e.g. bbmle::mle2()) than to hack glm to do it ...)

但是,尽管有很多警告,但没有对泊松均值进行零估计的起点还是可以的:

However, a starting point where there are no zero estimates of the Poisson mean works out fine, although there are plenty of warnings:

glm(bar ~ foo, family = poisson(link = "identity"), start=c(1,0))

但是:我想指出您误解了链接功能的目的.即使在标准对数链接的情况下,泊松回归的响应变量中的零也可以.泊松回归的GLM模型为y ~ Poisson(exp(a+b*x)),而不是log(y) = a + b*x.如果y=0,则后者是不好的,但是前者是完全可以的. glm(bar ~ foo, family = poisson())正常工作.

However: I want to point out that you're misunderstanding the purpose of the link function. It's fine to have zeros in the response variable of a Poisson regression, even with a standard log link. The GLM model for a Poisson regression is y ~ Poisson(exp(a+b*x)), not log(y) = a + b*x. The latter is bad if y=0, but the former is perfectly OK. glm(bar ~ foo, family = poisson()) works just fine.

通常,非规范链接函数有些痛苦:它们有时正是您所需要的(尽管从您所说的内容来看,我不认为这对您而言是正确的),但是它们往往比规范的链接更容易操作,更难安装.

In general, non-canonical link functions are a bit of a pain: they're sometimes exactly what you need (although from what you've said I'm not convinced that this is true in your case), but they tend to be fussier and harder to fit than the canonical links.

最后一点:我可能会将您想要的内容称为非规范"或非标准"链接;对我来说, custom 链接功能将是R中family()命令所没有提供的功能,因此您必须自己编写链接功能(例如,参见

One final note: I would probably refer to what you want as a "non-canonical" or "non-standard" link; a custom link function, for me, would be one that wasn't provided by the family() command in R, so you had to write the link function yourself (e.g. see http://rpubs.com/bbolker/4082 )

这篇关于如何在glm中使用自定义链接功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆