用texreg聚集标准错误? [英] Clustered standard errors with texreg?
问题描述
我正在尝试复制此 stata示例,然后从stargazer
移到texreg
.数据可用
I'm trying to reproduce this stata example and move from stargazer
to texreg
. The data is available here.
要运行回归并获取相关信息,请运行以下代码:
To run the regression and get the se I run this code:
library(readstata13)
library(sandwich)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- as.integer(rownames(model_result$model))
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
sqrt(diag(vcovCL))
}
elemapi2 <- read.dta13(file = 'elemapi2.dta')
lm1 <- lm(formula = api00 ~ acs_k3 + acs_46 + full + enroll, data = elemapi2)
se.lm1 <- cluster_se(model_result = lm1, data = elemapi2, cluster = "dnum")
stargazer::stargazer(lm1, type = "text", style = "aer", se = list(se.lm1))
==========================================================
api00
----------------------------------------------------------
acs_k3 6.954
(6.901)
acs_46 5.966**
(2.531)
full 4.668***
(0.703)
enroll -0.106**
(0.043)
Constant -5.200
(121.786)
Observations 395
R2 0.385
Adjusted R2 0.379
Residual Std. Error 112.198 (df = 390)
F Statistic 61.006*** (df = 4; 390)
----------------------------------------------------------
Notes: ***Significant at the 1 percent level.
**Significant at the 5 percent level.
*Significant at the 10 percent level.
texreg
产生此:
texreg::screenreg(lm1, override.se=list(se.lm1))
========================
Model 1
------------------------
(Intercept) -5.20
(121.79)
acs_k3 6.95
(6.90)
acs_46 5.97 ***
(2.53)
full 4.67 ***
(0.70)
enroll -0.11 ***
(0.04)
------------------------
R^2 0.38
Adj. R^2 0.38
Num. obs. 395
RMSE 112.20
========================
如何确定p值?
推荐答案
首先,请注意,使用as.integer
是危险的,一旦使用具有非数字行名的数据,可能会引起问题.例如,使用行名称由汽车名称组成的内置数据集mtcars
,您的函数会将所有行名称强制为NA
,而您的函数将不起作用.
First, notice that your usage of as.integer
is dangerous and likely to cause problems once you use data with non-numeric rownames. For instance, using the built-in dataset mtcars
whose rownames consist of car names, your function will coerce all rownames to NA
, and your function will not work.
对于您的实际问题,可以为texreg
提供自定义p值,这意味着您需要计算相应的p值.为此,您可以计算方差-协方差矩阵,计算检验统计量,然后手动计算p值,或者您只需计算方差-协方差矩阵并将其提供给例如coeftest
.然后,您可以从那里提取标准误差和p值.由于我不愿下载任何数据,因此将mtcars
-data用于以下内容:
To your actual question, you can provide custom p-values to texreg
, which means that you need to compute the corresponding p-values. To achieve this, you could compute the variance-covariance matrix, compute the test-statistics, and then compute the p-value manually, or you just compute the variance-covariance matrix and supply it to e.g. coeftest
. Then you can extract the standard errors and p-values from there. Since I am unwilling to download any data, I use the mtcars
-data for the following:
library(sandwich)
library(lmtest)
library(texreg)
cluster_se <- function(model_result, data, cluster){
model_variables <- intersect(colnames(data), c(colnames(model_result$model), cluster))
model_rows <- rownames(model_result$model) # changed to be able to work with mtcars, not tested with other data
data <- data[model_rows, model_variables]
cl <- data[[cluster]]
M <- length(unique(cl))
N <- nrow(data)
K <- model_result$rank
dfc <- (M/(M-1))*((N-1)/(N-K))
uj <- apply(estfun(model_result), 2, function(x) tapply(x, cl, sum));
vcovCL <- dfc*sandwich(model_result, meat=crossprod(uj)/N)
}
lm1 <- lm(formula = mpg ~ cyl + disp, data = mtcars)
vcov.lm1 <- cluster_se(model_result = lm1, data = mtcars, cluster = "carb")
standard.errors <- coeftest(lm1, vcov. = vcov.lm1)[,2]
p.values <- coeftest(lm1, vcov. = vcov.lm1)[,4]
texreg::screenreg(lm1, override.se=standard.errors, override.p = p.values)
为了完整起见,让我们手动进行操作:
And just for completeness sake, let's do it manually:
t.stats <- abs(coefficients(lm1) / sqrt(diag(vcov.lm1)))
t.stats
(Intercept) cyl disp
38.681699 5.365107 3.745143
这些是您使用集群鲁棒标准错误的t统计量.自由度存储在lm1$df.residual
中,并使用t分布的内置函数(请参见例如?pt
),我们得到:
These are your t-statistics using the cluster-robust standard errors. The degree of freedom is stored in lm1$df.residual
, and using the built in functions for the t-distribution (see e.g. ?pt
), we get:
manual.p <- 2*pt(-t.stats, df=lm1$df.residual)
manual.p
(Intercept) cyl disp
1.648628e-26 9.197470e-06 7.954759e-04
在这里,pt
是分布函数,我们想要计算观察统计数据的概率至少与我们观察到的统计数据一样极端.由于我们测试双面并且它是对称密度,因此我们首先使用负值求左极值,然后将其加倍.这与使用2*(1-pt(t.stats, df=lm1$df.residual))
相同.现在,只需检查一下是否会产生与以前相同的结果:
Here, pt
is the distribution function, and we want to compute the probability of observing a statistic at least as extreme as the one we observe. Since we testing two-sided and it is a symmetric density, we first take the left extreme using the negative value, and then double it. This is identical to using 2*(1-pt(t.stats, df=lm1$df.residual))
. Now, just to check that this yields the same result as before:
all.equal(p.values, manual.p)
[1] TRUE
这篇关于用texreg聚集标准错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!