Stata和R中Logit回归的不同鲁棒标准误差 [英] Different Robust Standard Errors of Logit Regression in Stata and R

查看:559
本文介绍了Stata和R中Logit回归的不同鲁棒标准误差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Stata向R复制logit回归.在Stata中,我使用选项"robust"具有可靠的标准误差(异方差一致性标准误差).我能够从Stata复制完全相同的系数,但是对于三明治"软件包却无法获得相同的鲁棒标准误差.

I am trying to replicate a logit regression from Stata to R. In Stata I use the option "robust" to have the robust standard error (heteroscedasticity-consistent standard error). I am able to replicate the exactly same coefficients from Stata, but I am not able to have the same robust standard error with the package "sandwich".

我尝试了一些OLS线性回归示例;似乎R和Stata的三明治估计量为我提供了OLS相同的鲁棒标准误.有人知道Stata如何计算非线性回归的三明治估计量吗?在我的例子中是logit回归?

I have tried some OLS linear regression examples; it seems like the sandwich estimators of R and Stata give me the same robust standard error for OLS. Does anybody know how Stata calculate the sandwich estimator for non-linear regression, in my case the logit regression?

谢谢!

随附的代码: 在R中:

Codes Attached: in R:

library(sandwich)
library(lmtest)    
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")    
mydata$rank<-factor(mydata$rank)    
myfit<-glm(admit~gre+gpa+rank,data=mydata,family=binomial(link="logit"))    
summary(myfit)    
coeftest(myfit, vcov = sandwich)    
coeftest(myfit, vcov = vcovHC(myfit, "HC0"))    
coeftest(myfit, vcov = vcovHC(myfit))    
coeftest(myfit, vcov = vcovHC(myfit, "HC3"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC1"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC2"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC"))    
coeftest(myfit, vcov = vcovHC(myfit, "const"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC4"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC4m"))    
coeftest(myfit, vcov = vcovHC(myfit, "HC5"))    

状态:

use http://www.ats.ucla.edu/stat/stata/dae/binary.dta, clear    
logit admit gre gpa i.rank, robust    

推荐答案

Stata中默认的所谓健壮"标准错误对应于同名程序包中的sandwich()计算内容.唯一的区别是有限样本调整的完成方式.在sandwich(...)函数中,默认情况下根本不进行任何有限样本调整,即将三明治除以1/n,其中n是观察次数.或者,可以使用sandwich(..., adjust = TRUE)除以1/(n-k),其中k是回归数.然后Stata除以1/(n-1).

The default so-called "robust" standard errors in Stata correspond to what sandwich() from the package of the same name computes. The only difference is how the finite-sample adjustment is done. In the sandwich(...) function no finite-sample adjustment is done at all by default, i.e., the sandwich is divided by 1/n where n is the number of observations. Alternatively, sandwich(..., adjust = TRUE) can be used which divides by 1/(n - k) where k is the number of regressors. And Stata divides by 1/(n - 1).

当然,这些渐近线完全没有区别.除了少数特殊情况(例如OLS线性回归)外,没有理由要求1/(n-k)或1/(n-1)在有限样本(例如,无偏)中可以正确"工作.至少据我所知.

Of course, asymptotically these do not differ at all. And except for a few special cases (e.g., OLS linear regression) there is no argument for 1/(n - k) or 1/(n - 1) to work "correctly" in finite samples (e.g., unbiasedness). At least not to the best of my knowledge.

因此,要获得与Stata中相同的结果,您可以执行以下操作:

So to obtain the same results as in Stata you can do do:

sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(myfit, vcov = sandwich1)

这产生

z test of coefficients:

              Estimate Std. Error z value  Pr(>|z|)    
(Intercept) -3.9899791  1.1380890 -3.5059 0.0004551 ***
gre          0.0022644  0.0011027  2.0536 0.0400192 *  
gpa          0.8040375  0.3451359  2.3296 0.0198259 *  
rank2       -0.6754429  0.3144686 -2.1479 0.0317228 *  
rank3       -1.3402039  0.3445257 -3.8900 0.0001002 ***
rank4       -1.5514637  0.4160544 -3.7290 0.0001922 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

仅作记录:在二进制响应的情况下,这些健壮"的标准错误对任何事物都不具有鲁棒性.如果正确指定了模型,则它们是一致的,可以使用它们,但是它们不能防止模型中的任何错误指定.因为三明治标准误起作用的基本假设是正确指定了模型方程式(或更准确地说是对应的得分函数),而其余模型可能未正确指定.但是,在二元回归中,由于模型方程仅由均值(=概率)组成,似然分别为均值和1-均值,因此没有错误指定的余地.这与线性或计数数据回归不同,后者可能存在异方差,过度分散等情况.

And just for the record: In the binary response case, these "robust" standard errors are not robust against anything. Provided that the model is correctly specified, they are consistent and it's ok to use them but they don't guard against any misspecification in the model. Because the basic assumption for the sandwich standard errors to work is that the model equation (or more precisely the corresponding score function) is correctly specified while the rest of the model may be misspecified. However, in a binary regression there is no room for misspecification because the model equation just consists of the mean (= probability) and the likelihood is the mean and 1 - mean, respectively. This is in contrast to linear or count data regression where there may be heteroskedasticity, overdispersion, etc.

这篇关于Stata和R中Logit回归的不同鲁棒标准误差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆