R中的多项式logit:mlogit与nnet [英] Multinomial logit in R: mlogit versus nnet
问题描述
我想在R中运行多项式logit,并使用了两个库, nnet
和 mlogit
,产生不同的结果并报告不同类型的统计信息.我的问题是:
I want to run a multinomial logit in R and have used two libraries, nnet
and mlogit
, which produce different results and report different types of statistics. My questions are:
-
nnet
报告的系数和标准误差与mlogit
报告的系数和标准误差之间的差异是什么?
What is the source of discrepency between the coefficients and standard errors reported by
nnet
and those reported bymlogit
?
我想使用stargazer
将结果报告到Latex
文件中.这样做时,会有一个权衡的问题:
I would like to report my results to a Latex
file using stargazer
. When doing so, there is a problematic tradeoff:
-
如果我使用
mlogit
的结果,则可以得到所需的统计信息,例如psuedo R平方,但是输出为长格式(请参见下面的示例).
If I use the results from
mlogit
then I get the statistics I wish, such as psuedo R squared, however, the output is in long format (see example below).
如果我使用nnet
的结果,则格式符合预期,但它会报告我对AIC等不感兴趣的统计信息,但不包括psuedo R squared. p>
If I use the results from nnet
then the format is as expected, but it reports statistics that I am not interested in such as AIC, but does not include, for example, psuedo R squared.
当我使用stargazer
时,我希望以nnet
格式由mlogit
报告统计信息.
I would like to have the statistics reported by mlogit
in the formatting of nnet
when I use stargazer
.
这是一个可重现的示例,有三种选择:
Here is a reproducible example, with three choice alternatives:
library(mlogit)
df = data.frame(c(0,1,1,2,0,1,0), c(1,6,7,4,2,2,1), c(683,276,756,487,776,100,982))
colnames(df) <- c('y', 'col1', 'col2')
mydata = df
mldata <- mlogit.data(mydata, choice="y", shape="wide")
mlogit.model1 <- mlogit(y ~ 1| col1+col2, data=mldata)
编译时的tex输出是我认为不希望的长格式":
The tex output when compiled is of what I refer to as "long format" which I deem undesired:
现在,使用nnet
:
library(nnet)
mlogit.model2 = multinom(y ~ 1 + col1+col2, data=mydata)
stargazer(mlogit.model2)
给出tex输出:
这是我想要的宽"格式.注意系数和标准误差的不同.
which is of the "wide" format which I desire. Note the different coefficient and standard errors.
推荐答案
据我所知,有三个R包可用于估计多项式逻辑回归模型:mlogit
,nnet
和globaltest
(摘自生物导体).我在这里不考虑mnlogit
包,它是mlogit
的更快,更有效的实现.
以上所有软件包均使用不同的算法,对于较小的样本,它们给出的结果也不同.对于中等大小的样本(使用n <- 100
尝试),这些差异将消失.
请考虑以下来自 James Keirstead博客中的数据生成过程:
To my knowledge, there are three R packages that allow the estimation of the multinomial logistic regression model: mlogit
, nnet
and globaltest
(from Bioconductor). I do not consider here the mnlogit
package, a faster and more efficient implementation of mlogit
.
All the above packages use different algorithms that, for small samples, give different results. These differencies vanishes for moderate sample sizes (try with n <- 100
).
Consider the following data generating process taken from the James Keirstead's blog:
n <- 40
set.seed(4321)
df1 <- data.frame(x1=runif(n,0,100), x2=runif(n,0,100))
df1 <- transform(df1, y=1+ifelse(100 - x1 - x2 + rnorm(n,sd=10) < 0, 0,
ifelse(100 - 2*x2 + rnorm(n,sd=10) < 0, 1, 2)))
str(df1)
'data.frame': 40 obs. of 3 variables:
$ x1: num 33.48 90.91 41.15 4.38 76.35 ...
$ x2: num 68.6 42.6 49.9 36.1 49.6 ...
$ y : num 1 1 3 3 1 1 1 1 3 3 ...
table(df1$y)
1 2 3
19 8 13
三个软件包估计的模型参数分别为:
The model parameters estimated by the three packages are respectively:
library(mlogit)
df2 <- mlogit.data(df1, choice="y", shape="wide")
mlogit.mod <- mlogit(y ~ 1 | x1+x2, data=df2)
(mlogit.cf <- coef(mlogit.mod))
2:(intercept) 3:(intercept) 2:x1 3:x1 2:x2 3:x2
42.7874653 80.9453734 -0.5158189 -0.6412020 -0.3972774 -1.0666809
#######
library(nnet)
nnet.mod <- multinom(y ~ x1 + x2, df1)
(nnet.cf <- coef(nnet.mod))
(Intercept) x1 x2
2 41.51697 -0.5005992 -0.3854199
3 77.57715 -0.6144179 -1.0213375
#######
library(globaltest)
glbtest.mod <- globaltest::mlogit(y ~ x1+x2, data=df1)
(cf <- glbtest.mod@coefficients)
1 2 3
(Intercept) -41.2442934 1.5431814 39.7011119
x1 0.3856738 -0.1301452 -0.2555285
x2 0.4879862 0.0907088 -0.5786950
globaltest
的mlogit
命令适合模型而无需使用参考结果类别,因此通常的参数可以按以下方式计算:
The mlogit
command of globaltest
fits the model without using a reference outcome category, hence the usual parameters can be calculated as follows:
(glbtest.cf <- rbind(cf[,2]-cf[,1],cf[,3]-cf[,1]))
(Intercept) x1 x2
[1,] 42.78747 -0.5158190 -0.3972774
[2,] 80.94541 -0.6412023 -1.0666813
关于三个软件包中参数的估计,mlogit::mlogit
中使用的方法详细说明
Concerning the estimation of the parameters in the three packages, the method used in mlogit::mlogit
is explained in detail here.
In nnet::multinom
the model is a neural network with no hidden layers, no bias nodes and a softmax output layer; in our case there are 3 input units and 3 output units:
nnet:::summary.nnet(nnet.mod)
a 3-0-3 network with 12 weights
options were - skip-layer connections softmax modelling
b->o1 i1->o1 i2->o1 i3->o1
0.00 0.00 0.00 0.00
b->o2 i1->o2 i2->o2 i3->o2
0.00 41.52 -0.50 -0.39
b->o3 i1->o3 i2->o3 i3->o3
0.00 77.58 -0.61 -1.02
最大条件似然是multinom
中用于模型拟合的方法.
多项式logit模型的参数在globaltest::mlogit
中使用最大似然估计,并与等效对数线性模型和Poisson似然一起使用. 此处进行了描述.
Maximum conditional likelihood is the method used in multinom
for model fitting.
The parameters of multinomial logit models are estimated in globaltest::mlogit
using maximum likelihood and working with an equivalent log-linear model and the Poisson likelihood. The method is described here.
对于通过multinom
估计的模型,可以轻松计算出麦克法登的伪R平方,如下所示:
For models estimated by multinom
the McFadden's pseudo R-squared can be easily calculated as follows:
nnet.mod.loglik <- nnet:::logLik.multinom(nnet.mod)
nnet.mod0 <- multinom(y ~ 1, df1)
nnet.mod0.loglik <- nnet:::logLik.multinom(nnet.mod0)
(nnet.mod.mfr2 <- as.numeric(1 - nnet.mod.loglik/nnet.mod0.loglik))
[1] 0.8483931
这时,我使用stargazer
为mlogit::mlogit
估计的模型生成报告,该报告与multinom
的报告尽可能相似.
基本思想是替换估计的系数.和由multinom
创建的对象的概率以及相应的mlogit
估计.
At this point, using stargazer
, I generate a report for the model estimated by mlogit::mlogit
which is as similar as possible to the report of multinom
.
The basic idea is to substitute the estimated coefficients and probabilities in the object created by multinom
with the corresponding estimates of mlogit
.
# Substitution of coefficients
nnet.mod2 <- nnet.mod
cf <- matrix(nnet.mod2$wts, nrow=4)
cf[2:nrow(cf), 2:ncol(cf)] <- t(matrix(mlogit.cf,nrow=2))
# Substitution of probabilities
nnet.mod2$wts <- c(cf)
nnet.mod2$fitted.values <- mlogit.mod$probabilities
这是结果:
library(stargazer)
stargazer(nnet.mod2, type="text")
==============================================
Dependent variable:
----------------------------
2 3
(1) (2)
----------------------------------------------
x1 -0.516** -0.641**
(0.212) (0.305)
x2 -0.397** -1.067**
(0.176) (0.519)
Constant 42.787** 80.945**
(18.282) (38.161)
----------------------------------------------
Akaike Inf. Crit. 24.623 24.623
==============================================
Note: *p<0.1; **p<0.05; ***p<0.01
现在我正在处理最后一个问题:如何在上述stargazer
输出中可视化loglik,伪R2和其他信息.
Now I am working on the last issue: how to visualize loglik, pseudo R2 and other information in the above stargazer
output.
这篇关于R中的多项式logit:mlogit与nnet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!