R中的多项式logit:mlogit与nnet [英] Multinomial logit in R: mlogit versus nnet

查看:205
本文介绍了R中的多项式logit:mlogit与nnet的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在R中运行多项式logit,并使用了两个库, nnet mlogit ,产生不同的结果并报告不同类型的统计信息.我的问题是:

I want to run a multinomial logit in R and have used two libraries, nnet and mlogit, which produce different results and report different types of statistics. My questions are:

  1. nnet报告的系数和标准误差与mlogit报告的系数和标准误差之间的差异是什么?

  1. What is the source of discrepency between the coefficients and standard errors reported by nnet and those reported by mlogit?

我想使用stargazer将结果报告到Latex文件中.这样做时,会有一个权衡的问题:

I would like to report my results to a Latex file using stargazer. When doing so, there is a problematic tradeoff:

  • 如果我使用mlogit的结果,则可以得到所需的统计信息,例如psuedo R平方,但是输出为长格式(请参见下面的示例).

  • If I use the results from mlogit then I get the statistics I wish, such as psuedo R squared, however, the output is in long format (see example below).

如果我使用nnet的结果,则格式符合预期,但它会报告我对AIC等不感兴趣的统计信息,但不包括psuedo R squared. p>

If I use the results from nnet then the format is as expected, but it reports statistics that I am not interested in such as AIC, but does not include, for example, psuedo R squared.

当我使用stargazer时,我希望以nnet格式由mlogit报告统计信息.

I would like to have the statistics reported by mlogit in the formatting of nnet when I use stargazer.

这是一个可重现的示例,有三种选择:

Here is a reproducible example, with three choice alternatives:

library(mlogit)

df = data.frame(c(0,1,1,2,0,1,0), c(1,6,7,4,2,2,1), c(683,276,756,487,776,100,982))
colnames(df) <- c('y', 'col1', 'col2')
mydata = df

mldata <- mlogit.data(mydata, choice="y", shape="wide")
mlogit.model1 <- mlogit(y ~ 1| col1+col2, data=mldata)

编译时的tex输出是我认为不希望的长格式":

The tex output when compiled is of what I refer to as "long format" which I deem undesired:

现在,使用nnet:

library(nnet)
mlogit.model2 = multinom(y ~ 1 + col1+col2, data=mydata)
stargazer(mlogit.model2)

给出tex输出:

这是我想要的宽"格式.注意系数和标准误差的不同.

which is of the "wide" format which I desire. Note the different coefficient and standard errors.

推荐答案

据我所知,有三个R包可用于估计多项式逻辑回归模型:mlogitnnetglobaltest(摘自生物导体).我在这里不考虑mnlogit包,它是mlogit的更快,更有效的实现.
以上所有软件包均使用不同的算法,对于较小的样本,它们给出的结果也不同.对于中等大小的样本(使用n <- 100尝试),这些差异将消失.
请考虑以下来自 James Keirstead博客中的数据生成过程:

To my knowledge, there are three R packages that allow the estimation of the multinomial logistic regression model: mlogit, nnet and globaltest (from Bioconductor). I do not consider here the mnlogit package, a faster and more efficient implementation of mlogit.
All the above packages use different algorithms that, for small samples, give different results. These differencies vanishes for moderate sample sizes (try with n <- 100).
Consider the following data generating process taken from the James Keirstead's blog:

n <- 40
set.seed(4321)
df1 <- data.frame(x1=runif(n,0,100), x2=runif(n,0,100))
df1 <- transform(df1, y=1+ifelse(100 - x1 - x2 + rnorm(n,sd=10) < 0, 0,
      ifelse(100 - 2*x2 + rnorm(n,sd=10) < 0, 1, 2)))
str(df1)
'data.frame':   40 obs. of  3 variables:
 $ x1: num  33.48 90.91 41.15 4.38 76.35 ...
 $ x2: num  68.6 42.6 49.9 36.1 49.6 ...
 $ y : num  1 1 3 3 1 1 1 1 3 3 ...
table(df1$y)
 1  2  3 
19  8 13 

三个软件包估计的模型参数分别为:

The model parameters estimated by the three packages are respectively:

library(mlogit)
df2 <- mlogit.data(df1, choice="y", shape="wide")
mlogit.mod <- mlogit(y ~ 1 | x1+x2, data=df2)
(mlogit.cf <- coef(mlogit.mod))

2:(intercept) 3:(intercept)          2:x1          3:x1          2:x2          3:x2 
   42.7874653    80.9453734    -0.5158189    -0.6412020    -0.3972774    -1.0666809 
#######
library(nnet)
nnet.mod <- multinom(y ~ x1 + x2, df1)
(nnet.cf <- coef(nnet.mod))

  (Intercept)         x1         x2
2    41.51697 -0.5005992 -0.3854199
3    77.57715 -0.6144179 -1.0213375
#######
library(globaltest)
glbtest.mod <- globaltest::mlogit(y ~ x1+x2, data=df1)
(cf <- glbtest.mod@coefficients)

                      1          2          3
(Intercept) -41.2442934  1.5431814 39.7011119
x1            0.3856738 -0.1301452 -0.2555285
x2            0.4879862  0.0907088 -0.5786950

globaltestmlogit命令适合模型而无需使用参考结果类别,因此通常的参数可以按以下方式计算:

The mlogit command of globaltest fits the model without using a reference outcome category, hence the usual parameters can be calculated as follows:

(glbtest.cf <- rbind(cf[,2]-cf[,1],cf[,3]-cf[,1]))
     (Intercept)         x1         x2
[1,]    42.78747 -0.5158190 -0.3972774
[2,]    80.94541 -0.6412023 -1.0666813

关于三个软件包中参数的估计,mlogit::mlogit中使用的方法详细说明

Concerning the estimation of the parameters in the three packages, the method used in mlogit::mlogit is explained in detail here.
In nnet::multinom the model is a neural network with no hidden layers, no bias nodes and a softmax output layer; in our case there are 3 input units and 3 output units:

nnet:::summary.nnet(nnet.mod)
a 3-0-3 network with 12 weights
options were - skip-layer connections  softmax modelling 
 b->o1 i1->o1 i2->o1 i3->o1 
  0.00   0.00   0.00   0.00 
 b->o2 i1->o2 i2->o2 i3->o2 
  0.00  41.52  -0.50  -0.39 
 b->o3 i1->o3 i2->o3 i3->o3 
  0.00  77.58  -0.61  -1.02

最大条件似然是multinom中用于模型拟合的方法.
多项式logit模型的参数在globaltest::mlogit中使用最大似然估计,并与等效对数线性模型和Poisson似然一起使用.
此处进行了描述.

Maximum conditional likelihood is the method used in multinom for model fitting.
The parameters of multinomial logit models are estimated in globaltest::mlogit using maximum likelihood and working with an equivalent log-linear model and the Poisson likelihood. The method is described here.

对于通过multinom估计的模型,可以轻松计算出麦克法登的伪R平方,如下所示:

For models estimated by multinom the McFadden's pseudo R-squared can be easily calculated as follows:

nnet.mod.loglik <- nnet:::logLik.multinom(nnet.mod)
nnet.mod0 <- multinom(y ~ 1, df1)
nnet.mod0.loglik <- nnet:::logLik.multinom(nnet.mod0)
(nnet.mod.mfr2 <- as.numeric(1 - nnet.mod.loglik/nnet.mod0.loglik))
[1] 0.8483931

这时,我使用stargazermlogit::mlogit估计的模型生成报告,该报告与multinom的报告尽可能相似.
基本思想是替换估计的系数.和由multinom创建的对象的概率以及相应的mlogit估计.

At this point, using stargazer, I generate a report for the model estimated by mlogit::mlogit which is as similar as possible to the report of multinom.
The basic idea is to substitute the estimated coefficients and probabilities in the object created by multinom with the corresponding estimates of mlogit.

# Substitution of coefficients
nnet.mod2 <- nnet.mod
cf <- matrix(nnet.mod2$wts, nrow=4)
cf[2:nrow(cf), 2:ncol(cf)] <- t(matrix(mlogit.cf,nrow=2))
# Substitution of probabilities
nnet.mod2$wts <- c(cf)
nnet.mod2$fitted.values <- mlogit.mod$probabilities

这是结果:

library(stargazer)
stargazer(nnet.mod2, type="text")

==============================================
                      Dependent variable:     
                  ----------------------------
                        2              3      
                       (1)            (2)     
----------------------------------------------
x1                   -0.516**      -0.641**   
                     (0.212)        (0.305)   

x2                   -0.397**      -1.067**   
                     (0.176)        (0.519)   

Constant             42.787**      80.945**   
                     (18.282)      (38.161)   

----------------------------------------------
Akaike Inf. Crit.     24.623        24.623    
==============================================
Note:              *p<0.1; **p<0.05; ***p<0.01

现在我正在处理最后一个问题:如何在上述stargazer输出中可视化loglik,伪R2和其他信息.

Now I am working on the last issue: how to visualize loglik, pseudo R2 and other information in the above stargazer output.

这篇关于R中的多项式logit:mlogit与nnet的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆