具有聚类标准误差的logit二项式回归 [英] logit binomial regression with clustered standard errors

查看:221
本文介绍了具有聚类标准误差的logit二项式回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从状态中复制glm估算值:

I´m trying to replicate a glm estimation from stata:

sysuse auto
logit foreign weight mpg, cluster(rep78)


Logistic regression                               Number of obs   =         69
                                                  Wald chi2(2)    =      31.57
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -22.677963                 Pseudo R2       =     0.4652

                                  (Std. Err. adjusted for 5 clusters in rep78)
------------------------------------------------------------------------------
         |               Robust
 foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
 -------------+----------------------------------------------------------------
  weight |  -.0042281   .0008022    -5.27   0.000    -.0058004   -.0026558
     mpg |  -.1524015   .0271285    -5.62   0.000    -.2055724   -.0992306
   _cons |   14.21689   2.031826     7.00   0.000     10.23458    18.19919

我尝试了在网上找到的不同答案:

I have tried different answers I have found on the web:

data <- read.dta("http://www.stata-press.com/data/r9/auto.dta")
data <- data[,c("foreign", "weight", "mpg", "rep78")]
data <- na.omit(data)
logit.model <- glm(foreign ~ weight + mpg ,  data, family = binomial(logit))
clx <- function(fm, dfcw, cluster){
     library(sandwich);library(lmtest)
     M <- length(unique(cluster))   
     N <- length(cluster)           
     K <- fm$rank                        
     dfc <- (M/(M-1))*((N-1)/(N-K))  
     uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
     vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)*dfcw
     coeftest(fm, vcovCL) 
}
clx(logit.model, 1, data$rep78)

z test of coefficients:

               Estimate  Std. Error z value  Pr(>|z|)    
(Intercept) 14.21688944  2.06235711  6.8935 5.443e-12 ***
weight      -0.00422810  0.00081428 -5.1924 2.076e-07 ***
mpg         -0.15240148  0.02753611 -5.5346 3.119e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

或:

logit.model <- lrm(foreign ~ weight + mpg, x=T, y=T, data=data)
robcov(logit.model, cluster=data$rep78)

          Coef    S.E.   Wald Z Pr(>|Z|)
Intercept 14.2169 1.8173  7.82  <0.0001 
weight    -0.0042 0.0007 -5.89  <0.0001 
mpg       -0.1524 0.0243 -6.28  <0.0001 

或:

logit.model <- glm(foreign ~ weight + mpg ,  data, family = binomial(logit))
G <- length(unique(data$rep78))
N <- length(data$rep78)
dfa <- (G/(G - 1)) * (N - 1)/logit.model$df.residual
c_vcov <- dfa * vcovHC(logit.model, type = "HC1", cluster = "group", adjust = T)
coeftest(logit.model, vcov = c_vcov)

z test of coefficients:

              Estimate Std. Error z value  Pr(>|z|)    
(Intercept) 14.2168894  5.0830412  2.7969 0.0051591 ** 
weight      -0.0042281  0.0010915 -3.8736 0.0001072 ***
mpg         -0.1524015  0.1127248 -1.3520 0.1763823  

但是,没有任何一个我得到完全相同的标准错误.我想知道我是在做错什么,还是可以使用另一个软件包来获得与Stata完全相同的结果.

However, with none of the previous I get exactly the same standard errors. I was wondering if I´m doing something wrong or if there is another package which I could use to get exactly the same results as in Stata.

推荐答案

我能够通过执行以下操作在stata中复制结果:

I was able to replicate the results in stata by doing:

clrobustse <- function(fit.model, clusterid) {
    rank=fit.model$rank
    N.obs <- length(clusterid)            
    N.clust <- length(unique(clusterid))  
    dfc <- N.clust/(N.clust-1)                    
    vcv <- vcov(fit.model)
    estfn <- estfun(fit.model)
    uj <- apply(estfn, 2, function(x) tapply(x, clusterid, sum))
    N.VCV <- N.obs * vcv
    ujuj.nobs  <- crossprod(uj)/N.obs
    vcovCL <- dfc*(1/N.obs * (N.VCV %*% ujuj.nobs %*% N.VCV))
    coeftest(fit.model, vcov=vcovCL)
}
clrobustse(logit.model, data$rep78)

这篇关于具有聚类标准误差的logit二项式回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆