R计算具有奇异性的lm模型的鲁棒标准误差(vcovHC) [英] R calculate robust standard errors (vcovHC) for lm model with singularities
问题描述
在R中,当由于奇异性而使某些系数下降时,如何使用vcovHC()计算鲁棒的标准误差?标准lm函数似乎可以很好地计算出所有实际估计的系数的正常标准误差,但是vcovHC()会引发错误:面包错误.%*%肉类.:不合格参数".
In R, how can I calculate robust standard errors using vcovHC() when some coefficients are dropped due to singularities? The standard lm function seems to do fine calculating normal standard errors for all coefficients that are actually estimated, but vcovHC() throws an error: "Error in bread. %*% meat. : non-conformable arguments".
(我正在使用的实际数据要复杂一些.实际上,它是一个使用两个不同固定效果的模型,我遇到了局部奇异点,我无法简单地摆脱它们.至少我不知道如何.对于我使用的两个固定效果,第一个因子具有150个级别,第二个因子具有142个级别,并且总共有9个奇点,这是因为数据是在10个块中收集的.)
(The actual data I'm using is a bit more complicated. In fact, it is a model using two different fixed effects and I run into local singularities which I cannot simply get rid of. At least I would not know how. For the two fixed effects I'm using the first factor has 150 levels, the second has 142 levels and there are in total 9 singularities which result from the fact that the data was collected in ten blocks.)
这是我的输出:
Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May +
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)
Residuals:
Min 1Q Median 3Q Max
-130.12 -60.95 0.08 61.05 137.35
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1169.74313 57.36807 20.390 <2e-16 ***
two -0.07963 0.06720 -1.185 0.237
three -0.04053 0.06686 -0.606 0.545
Jan 8.10336 22.05552 0.367 0.714
Feb 0.44025 22.11275 0.020 0.984
Mar 19.65066 22.02454 0.892 0.373
Apr -13.19779 22.02886 -0.599 0.550
May 15.39534 22.10445 0.696 0.487
Jun -12.50227 22.07013 -0.566 0.572
Jul -20.58648 22.06772 -0.933 0.352
Aug -0.72223 22.36923 -0.032 0.974
Sep 12.42204 22.09296 0.562 0.574
Oct 25.14836 22.04324 1.141 0.255
Nov 18.13337 22.08717 0.821 0.413
Dec NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878, Adjusted R-squared: -0.005939
F-statistic: 0.8914 on 13 and 226 DF, p-value: 0.5629
> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments
以下是用于重现该错误的最少代码.
Here is a minimal code snipped to reproduce the error.
library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
two=c(sample(200:439)),
three=c(sample(600:839)),
Jan=c(rep(1,20),rep(0,220)),
Feb=c(rep(0,20),rep(1,20),rep(0,200)),
Mar=c(rep(0,40),rep(1,20),rep(0,180)),
Apr=c(rep(0,60),rep(1,20),rep(0,160)),
May=c(rep(0,80),rep(1,20),rep(0,140)),
Jun=c(rep(0,100),rep(1,20),rep(0,120)),
Jul=c(rep(0,120),rep(1,20),rep(0,100)),
Aug=c(rep(0,140),rep(1,20),rep(0,80)),
Sep=c(rep(0,160),rep(1,20),rep(0,60)),
Oct=c(rep(0,180),rep(1,20),rep(0,40)),
Nov=c(rep(0,200),rep(1,20),rep(0,20)),
Dec=c(rep(0,220),rep(1,20)))
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)
推荐答案
您似乎要针对的是固定效果估算,尽管这个问题是在不久前提出的,但我遇到了同样的问题,这是我的解决方案:
固定效应可以通过在估计方程式中包含+ factor()
来控制:
What you seem to be aiming at is a fixed effects estimation, though this question was raised a while ago I ran into the same problem, here is my solution:
Fixed effects can be controlled for by including a + factor()
in your estimation equation:
因此,我首先创建了另一列:
So I created an additional column first:
# create an addtitional column in your data
dat$month <- "0"
#this column will contain the month, not a dummy for months
for (i in 1:length(dat$one)){
if (dat[i,"Jan"]==1){
dat[i,"month"]<- "Jan"}
if (dat[i,"Feb"]==1){
dat[i,"month"]<- "Feb"}
if (dat[i,"Mar"]==1){
dat[i,"month"]<- "Mar"}
if (dat[i,"Apr"]==1){
dat[i,"month"]<- "Apr"}
if (dat[i,"May"]==1){
dat[i,"month"]<- "May"}
if (dat[i,"Jun"]==1){
dat[i,"month"]<- "Jun"}
if (dat[i,"Jul"]==1){
dat[i,"month"]<- "Jul"}
if (dat[i,"Aug"]==1){
dat[i,"month"]<- "Aug"}
if (dat[i,"Sep"]==1){
dat[i,"month"]<- "Sep"}
if (dat[i,"Oct"]==1){
dat[i,"month"]<- "Oct"}
if (dat[i,"Nov"]==1){
dat[i,"month"]<- "Nov"}
if (dat[i,"Dec"]==1){
dat[i,"month"]<- "Dec"}
}
i <- NULL
此列现在可以用作回归方程式中的固定或恒定影响因子:
This column now can be used as the fixed or constant effect factor in the regression equation:
> #you can use the created column as fixed effect factor in your
+ regression
> model_A <- lm(one ~ two + three + factor(month), data=dat)
> summary(model_A)
Call:
lm(formula = one ~ two + three + factor(month), data = dat)
Residuals:
Min 1Q Median 3Q Max
-133.817 -55.636 3.329 56.768 126.772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1143.95365 54.99724 20.800 <2e-16 ***
two -0.09670 0.06621 -1.460 0.1455
three 0.02446 0.06666 0.367 0.7141
factor(month)Aug 3.51788 22.09948 0.159 0.8737
factor(month)Dec 5.60192 22.41204 0.250 0.8029
factor(month)Feb -22.62460 22.10889 -1.023 0.3072
factor(month)Jan -13.89553 22.25117 -0.624 0.5329
factor(month)Jul -38.85320 22.13980 -1.755 0.0806 .
factor(month)Jun -14.09355 22.18707 -0.635 0.5259
factor(month)Mar -0.45055 22.13638 -0.020 0.9838
factor(month)May -7.58935 22.14137 -0.343 0.7321
factor(month)Nov -14.75156 22.27288 -0.662 0.5084
factor(month)Oct -26.20290 22.09416 -1.186 0.2369
factor(month)Sep -4.53159 22.26334 -0.204 0.8389
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared: 0.04381, Adjusted R-squared: -0.01119
F-statistic: 0.7966 on 13 and 226 DF, p-value: 0.6635
> #and also do the same without intercept if so needed
> model_B <- lm(one ~ 0 + two + three + factor(month), data=dat)
> summary(model_B)
Call:
lm(formula = one ~ 0 + two + three + factor(month), data = dat)
Residuals:
Min 1Q Median 3Q Max
-133.817 -55.636 3.329 56.768 126.772
Coefficients:
Estimate Std. Error t value Pr(>|t|)
two -0.09670 0.06621 -1.460 0.146
three 0.02446 0.06666 0.367 0.714
factor(month)Apr 1143.95365 54.99724 20.800 <2e-16 ***
factor(month)Aug 1147.47152 54.57201 21.027 <2e-16 ***
factor(month)Dec 1149.55556 53.52499 21.477 <2e-16 ***
factor(month)Feb 1121.32904 55.18864 20.318 <2e-16 ***
factor(month)Jan 1130.05812 52.79625 21.404 <2e-16 ***
factor(month)Jul 1105.10045 54.94940 20.111 <2e-16 ***
factor(month)Jun 1129.86010 53.85865 20.978 <2e-16 ***
factor(month)Mar 1143.50310 53.59603 21.336 <2e-16 ***
factor(month)May 1136.36429 53.38218 21.287 <2e-16 ***
factor(month)Nov 1129.20208 53.54934 21.087 <2e-16 ***
factor(month)Oct 1117.75075 55.35703 20.192 <2e-16 ***
factor(month)Sep 1139.42205 53.58611 21.263 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared: 0.9964, Adjusted R-squared: 0.9961
F-statistic: 4409 on 14 and 226 DF, p-value: < 2.2e-16
这使您可以对面板数据进行常规的OLS回归.
This lets you run a regular OLS regression on panel data.
这篇关于R计算具有奇异性的lm模型的鲁棒标准误差(vcovHC)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!