可以使用广义线性模型来估计多项式模型吗? [英] Can multinomial models be estimated using Generalized Linear model?

查看:208
本文介绍了可以使用广义线性模型来估计多项式模型吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在分类数据分析中,我们经常使用逻辑回归来估计二项式结果与一个或多个协变量之间的关系.

In analysis of categorical data, we often use logistic regression to estimate relationships between binomial outcomes and one or more covariates.

我知道这是广义线性模型(GLM)的一种.在R中,这是通过glm函数使用参数family=binomial来实现的.另一方面,在分类数据分析中是多项模型.这些不是GLM吗?难道不能使用glm函数在R中估计它们吗?

I understand this is a type of generalized linear model (GLM). In R, this is implemented with the glm function using the argument family=binomial. On the other hand, in categorical data analysis are multinomial models. Are these not GLMs? And can't they be estimated in R using the glm function?

(在此关于多项式逻辑回归的帖子中.作者使用外部软件包mlogit,该软件包似乎也已过时)

(In this post for Multinomial Logistic Regression. The author uses an external package mlogit, which seems also outdated)

为什么GLM类别仅限于二分结果?是否因为可以将多类分类视为多个二元分类模型?

Why is the class of GLMs restricted to dichotomous outcomes? Is it because multi-class classification can be treated as multiple binary classification models?

推荐答案

R中的GLM使用Fisher Scoring估算.我想到了两种用于多类别对数的方法:比例赔率模型和对数线性模型或多项式回归.

The GLMs in R are estimated with Fisher Scoring. Two approaches to multi-category logit come to mind: proportional odds models and log-linear models or multinomial regression.

比例赔率模型是累积链接模型的一种特殊类型,并在MASS包中实现.不会使用Fisher评分进行估算,因此默认的glm.fit主力马将无法估算此类模型.然而,有趣的是,累积链接模型 是GLM,由McCullogh和Nelder在同名文字中进行了讨论.负二项式GLM也会发现类似的问题:严格意义上来说,它们是GLM,它们是链接函数和概率模型,但需要专门的估计例程.就R函数glm而言,不应将其视为每种GLM的详尽估计.

The proportional odds model is a special type of cumulative link model and is implemented in the MASS package. It is not estimated with Fisher scoring, so the default glm.fit work-horse would not be able to estimate such a model. Interestingly, however, cumulative link models are GLMs and were discussed in the eponymous text by McCullogh and Nelder. A similar issue is found with negative binomial GLMs: they are GLMs in the strict sense of a link function, and a probability model, but require specialized estimation routines. As far as the R function glm, one should not look at it as an exhaustive estimator for every type of GLM.

nnet具有对数线性模型估计器的实现.它符合他们使用soft-max熵的更复杂的神经网络估计器,后者是一个等效公式(理论在那里证明了这一点).事实证明,如果您愿意,可以 用默认值R中的glm估计对数线性模型.关键在于查看逻辑回归和泊松回归之间的联系.将计数模型的交互项(对数相对比率的差异)识别为结果的对数模型(对数比值比)中的一阶项,则可以通过对边距进行调节"来估计相同的参数和相同的SE多类别结果的$ K \ times 2 $列联表中的一个. 与此背景相关的SE问题在这里

nnet has an implementation of a loglinear model estimator. It is conformed to their more sophisticated neural net estimator using soft-max entropy, which is an equivalent formulation (theory is there to show this). It turns out you can estimate log-linear models with glm in default R if you're keen. The key lies in seeing the link between logistic and poisson regression. Recognizing the interaction terms of a count model (difference in log relative rates) as a first order term in a logistic model for an outcome (log odds ratio), you can estimate the same parameters and the same SEs by "conditioning" on the margins of the $K \times 2$ contingency table for a multi-category outcome. A related SE question on that background is here

以MASS软件包中的VA肺癌数据为例,如下所示:

Take as an example the following using the VA lung cancer data from the MASS package:

> summary(multinom(cell ~ factor(treat), data=VA))
# weights:  12 (6 variable)
initial  value 189.922327 
iter  10 value 182.240520
final  value 182.240516 
converged
Call:
multinom(formula = cell ~ factor(treat), data = VA)

Coefficients:
    (Intercept) factor(treat)2
2  6.931413e-01     -0.7985009
3 -5.108233e-01      0.4054654
4 -9.538147e-06     -0.5108138

Std. Errors:
  (Intercept) factor(treat)2
2   0.3162274      0.4533822
3   0.4216358      0.5322897
4   0.3651485      0.5163978

Residual Deviance: 364.481 
AIC: 376.481 

相比:

> VA.tab <- table(VA[, c('cell', 'treat')])
> summary(glm(Freq ~ cell * treat, data=VA.tab, family=poisson))

Call:
glm(formula = Freq ~ cell * treat, family = poisson, data = VA.tab)

Deviance Residuals: 
[1]  0  0  0  0  0  0  0  0

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.708e+00  2.582e-01  10.488   <2e-16 ***
cell2         6.931e-01  3.162e-01   2.192   0.0284 *  
cell3        -5.108e-01  4.216e-01  -1.212   0.2257    
cell4        -1.571e-15  3.651e-01   0.000   1.0000    
treat2        2.877e-01  3.416e-01   0.842   0.3996    
cell2:treat2 -7.985e-01  4.534e-01  -1.761   0.0782 .  
cell3:treat2  4.055e-01  5.323e-01   0.762   0.4462    
cell4:treat2 -5.108e-01  5.164e-01  -0.989   0.3226    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1.5371e+01  on 7  degrees of freedom
Residual deviance: 4.4409e-15  on 0  degrees of freedom
AIC: 53.066

Number of Fisher Scoring iterations: 3

将一种模型中的相互作用参数和主要治疗水平与第二种模型进行比较.还比较拦截. AIC之所以不同,是因为对数线性模型甚至是表边距的概率模型,该概率模型也受模型中其他参数的限制,但是就预测和推断而言,这两种方法得出的结果相同.

Compare the interaction parameters and the main levels for treat in the one model to the second. Compare also the intercept. The AICs are different because the loglinear model is a probability model for even the margins of the table which are conditioned upon by other parameters in the model, but in terms of prediction and inference these two approaches yield identical results.

总之,技巧问题! glm处理多类别的逻辑回归,只是需要对构成这种模型的内容有更多的了解.

So in short, trick question! glm handles multi-category logistic regression, it just takes a greater understanding of what constitutes such models.

这篇关于可以使用广义线性模型来估计多项式模型吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆