R计算具有奇异性的lm模型的鲁棒标准误差(vcovHC) [英] R calculate robust standard errors (vcovHC) for lm model with singularities

查看:252
本文介绍了R计算具有奇异性的lm模型的鲁棒标准误差(vcovHC)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中,当由于奇异性而使某些系数下降时,如何使用vcovHC()计算鲁棒的标准误差?标准lm函数似乎可以很好地计算出所有实际估计的系数的正常标准误差,但是vcovHC()会引发错误:面包错误.%*%肉类.:不合格参数".

In R, how can I calculate robust standard errors using vcovHC() when some coefficients are dropped due to singularities? The standard lm function seems to do fine calculating normal standard errors for all coefficients that are actually estimated, but vcovHC() throws an error: "Error in bread. %*% meat. : non-conformable arguments".

(我正在使用的实际数据要复杂一些.实际上,它是一个使用两个不同固定效果的模型,我遇到了局部奇异点,我无法简单地摆脱它们.至少我不知道如何.对于我使用的两个固定效果,第一个因子具有150个级别,第二个因子具有142个级别,并且总共有9个奇点,这是因为数据是在10个块中收集的.)

(The actual data I'm using is a bit more complicated. In fact, it is a model using two different fixed effects and I run into local singularities which I cannot simply get rid of. At least I would not know how. For the two fixed effects I'm using the first factor has 150 levels, the second has 142 levels and there are in total 9 singularities which result from the fact that the data was collected in ten blocks.)

这是我的输出:

Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May + 
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-130.12  -60.95    0.08   61.05  137.35 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1169.74313   57.36807  20.390   <2e-16 ***
two           -0.07963    0.06720  -1.185    0.237    
three         -0.04053    0.06686  -0.606    0.545    
Jan            8.10336   22.05552   0.367    0.714    
Feb            0.44025   22.11275   0.020    0.984    
Mar           19.65066   22.02454   0.892    0.373    
Apr          -13.19779   22.02886  -0.599    0.550    
May           15.39534   22.10445   0.696    0.487    
Jun          -12.50227   22.07013  -0.566    0.572    
Jul          -20.58648   22.06772  -0.933    0.352    
Aug           -0.72223   22.36923  -0.032    0.974    
Sep           12.42204   22.09296   0.562    0.574    
Oct           25.14836   22.04324   1.141    0.255    
Nov           18.13337   22.08717   0.821    0.413    
Dec                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878,    Adjusted R-squared: -0.005939 
F-statistic: 0.8914 on 13 and 226 DF,  p-value: 0.5629 

> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments

以下是用于重现该错误的最少代码.

Here is a minimal code snipped to reproduce the error.

library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
              two=c(sample(200:439)),
              three=c(sample(600:839)),
              Jan=c(rep(1,20),rep(0,220)),
              Feb=c(rep(0,20),rep(1,20),rep(0,200)),
              Mar=c(rep(0,40),rep(1,20),rep(0,180)),
              Apr=c(rep(0,60),rep(1,20),rep(0,160)),
              May=c(rep(0,80),rep(1,20),rep(0,140)),
              Jun=c(rep(0,100),rep(1,20),rep(0,120)),
              Jul=c(rep(0,120),rep(1,20),rep(0,100)),
              Aug=c(rep(0,140),rep(1,20),rep(0,80)),
              Sep=c(rep(0,160),rep(1,20),rep(0,60)),
              Oct=c(rep(0,180),rep(1,20),rep(0,40)),
              Nov=c(rep(0,200),rep(1,20),rep(0,20)),
              Dec=c(rep(0,220),rep(1,20))) 
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)

推荐答案

您似乎要针对的是固定效果估算,尽管这个问题是在不久前提出的,但我遇到了同样的问题,这是我的解决方案: 固定效应可以通过在估计方程式中包含+ factor()来控制:

What you seem to be aiming at is a fixed effects estimation, though this question was raised a while ago I ran into the same problem, here is my solution: Fixed effects can be controlled for by including a + factor() in your estimation equation:

因此,我首先创建了另一列:

So I created an additional column first:

# create an addtitional column in your data 
dat$month <- "0"
#this column will contain the month, not a dummy for months
for	(i in 1:length(dat$one)){
	if (dat[i,"Jan"]==1){
	dat[i,"month"]<- "Jan"}
	if (dat[i,"Feb"]==1){
	dat[i,"month"]<- "Feb"}
	if (dat[i,"Mar"]==1){
	dat[i,"month"]<- "Mar"}
	if (dat[i,"Apr"]==1){
	dat[i,"month"]<- "Apr"}
	if (dat[i,"May"]==1){
	dat[i,"month"]<- "May"}
	if (dat[i,"Jun"]==1){
	dat[i,"month"]<- "Jun"}
	if (dat[i,"Jul"]==1){
	dat[i,"month"]<- "Jul"}
	if (dat[i,"Aug"]==1){
	dat[i,"month"]<- "Aug"}
	if (dat[i,"Sep"]==1){
	dat[i,"month"]<- "Sep"}
	if (dat[i,"Oct"]==1){
	dat[i,"month"]<- "Oct"}
	if (dat[i,"Nov"]==1){
	dat[i,"month"]<- "Nov"}
	if (dat[i,"Dec"]==1){
	dat[i,"month"]<- "Dec"}
}
i <- NULL

此列现在可以用作回归方程式中的固定或恒定影响因子:

This column now can be used as the fixed or constant effect factor in the regression equation:

> #you can use the created column as fixed effect factor in your 
+ regression 
> model_A <- lm(one ~ two + three + factor(month), data=dat)
> summary(model_A)

Call:
lm(formula = one ~ two + three + factor(month), data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1143.95365   54.99724  20.800   <2e-16 ***
two                -0.09670    0.06621  -1.460   0.1455    
three               0.02446    0.06666   0.367   0.7141    
factor(month)Aug    3.51788   22.09948   0.159   0.8737    
factor(month)Dec    5.60192   22.41204   0.250   0.8029    
factor(month)Feb  -22.62460   22.10889  -1.023   0.3072    
factor(month)Jan  -13.89553   22.25117  -0.624   0.5329    
factor(month)Jul  -38.85320   22.13980  -1.755   0.0806 .  
factor(month)Jun  -14.09355   22.18707  -0.635   0.5259    
factor(month)Mar   -0.45055   22.13638  -0.020   0.9838    
factor(month)May   -7.58935   22.14137  -0.343   0.7321    
factor(month)Nov  -14.75156   22.27288  -0.662   0.5084    
factor(month)Oct  -26.20290   22.09416  -1.186   0.2369    
factor(month)Sep   -4.53159   22.26334  -0.204   0.8389    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.04381,   Adjusted R-squared:  -0.01119 
F-statistic: 0.7966 on 13 and 226 DF,  p-value: 0.6635

> #and also do the same without intercept if so needed
> model_B <- lm(one ~ 0 + two + three + factor(month), data=dat)
> summary(model_B)

Call:
lm(formula = one ~ 0 + two + three + factor(month), data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
two                -0.09670    0.06621  -1.460    0.146    
three               0.02446    0.06666   0.367    0.714    
factor(month)Apr 1143.95365   54.99724  20.800   <2e-16 ***
factor(month)Aug 1147.47152   54.57201  21.027   <2e-16 ***
factor(month)Dec 1149.55556   53.52499  21.477   <2e-16 ***
factor(month)Feb 1121.32904   55.18864  20.318   <2e-16 ***
factor(month)Jan 1130.05812   52.79625  21.404   <2e-16 ***
factor(month)Jul 1105.10045   54.94940  20.111   <2e-16 ***
factor(month)Jun 1129.86010   53.85865  20.978   <2e-16 ***
factor(month)Mar 1143.50310   53.59603  21.336   <2e-16 ***
factor(month)May 1136.36429   53.38218  21.287   <2e-16 ***
factor(month)Nov 1129.20208   53.54934  21.087   <2e-16 ***
factor(month)Oct 1117.75075   55.35703  20.192   <2e-16 ***
factor(month)Sep 1139.42205   53.58611  21.263   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.9964,    Adjusted R-squared:  0.9961 
F-statistic:  4409 on 14 and 226 DF,  p-value: < 2.2e-16

这使您可以对面板数据进行常规的OLS回归.

This lets you run a regular OLS regression on panel data.

这篇关于R计算具有奇异性的lm模型的鲁棒标准误差(vcovHC)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆