来自不平衡面板上的一阶差分回归的残差 [英] Residuals from first differenced regression on unbalanced panel

查看:377
本文介绍了来自不平衡面板上的一阶差分回归的残差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用plm估计一些不平衡面板数据上的一阶差分模型.我的模型似乎可以正常工作,并且可以得到系数估计值,但是我想知道是否有一种方法可以获取所使用的每个观测值的残差(或拟合值).

I am trying to use plm to estimate a first differenced model on some unbalanced panel data. My model seems to work and I get coefficient estimates, but I want to know if there is a way to get the residual (or fitted value) per observation used.

我遇到了两个问题,我不知道如何将残差附加到与它们相关联的观测值上,而且我似乎得到了不正确数量的残差.

I have run into two problems, I don't know how to attach residuals to the observation they are associated with, and I seem to get an incorrect number of residuals.

如果我使用model.name $ residuals从估计的模型中检索残差,则会得到一个比model.name $ model短的向量.

If I retrieve the residuals from the estimated model using model.name$residuals, I get a vector that is shorter than model.name$model.

require(plm)
X <- rnorm(14)
Y <- c(.4,1,1.5,1.3,1,4,5,6.5,7.3,3.7,5,.7,4,6)
Time <- rep(1:5,times=2)
Time <- c(Time, c(1,2,4,5))
ID <- rep(1:2,each=5)
ID <- c(ID,c(3,3,3,3))
TestData <- data.frame("Y"=Y,"X"=X,"ID"=ID,"Time"=Time)
model.name <- plm(Y~X,data=TestData,index = c("ID","Time"),model="fd")

> length(model.name$residuals)
[1] 11
> nrow(model.name$model)
[1] 14

(注意:ID = 3缺少对t = 3的观察)

(Note: ID=3 is missing an observation for t=3)

查看model.name $ model,我看到它包含所有观察值,包括每个ID成员的t = 1.在第一个差分中,t = 1的观测值将被删除,因此在这种情况下,所有时间段的两个ID应当在其余时间段中具有4个残差. ID = 3应该在t = 2时有残差,在t = 3时没有残差,在t = 4时没有残差,因为没有差值(由于缺少t = 3值),然后在t时有残差= 5.

Looking at model.name$model I see it includes all observations, including t=1 for each member of ID. In the first differencing the t=1 observations will be removed, so in this case both IDs with all time periods should have 4 residuals from the remaining time periods. ID=3 should have a residual for t=2, none for t=3 as it is missing, none for t=4 as there is no value to difference (due to the missing t=3 value) and then a residual for t=5.

由此看来,应该有10个残差,但是我有11个.对于为什么会有这么多残差,以及如何将残差连接到正确的索引(ID和时间),我将不胜感激.

From this it seems that there should be 10 residuals, but I have 11. I would appreciate any help with why there are this many residuals, and how to connect residuals to the correct index (ID and Time).

推荐答案

使用model="fd"完成的滞后是基于相邻的行,而不是时间索引的实际值.因此,如果您有不连续的时间段,这将给您带来意想不到的结果.为了避免这种情况,请在考虑滞后时间的同时进行区分并估计 pooling 模型.数据的不平衡性在这里不重要.

The lagging done with model="fd" is based on the neighbouring rows, not the actual value of the time index. Thus, if you have non-consecutive time periods, this will give you unexpected results. To avoid this, do the differencing yourself while respecting the time period when lagging and estimate a pooling model. The unbalancedness of the data is not of concern here.

从软件包plm的1.7.0版本开始,lag()函数根据每个默认值的时间间隔值执行滞后(先前的默认值为相邻行).使用此功能自己做滞后.

Since version 1.7.0 of package plm, there lag() function performs lagging based on the value of the time period per default (previous default was neighboring rows). Use this function to do the lagging yourself.

继续您的示例:

pTestData <- pdata.frame(TestData, index=c("ID", "Time"))

pTestData$Y_diff <- plm::lag(pTestData$Y) - pTestData$Y
pTestData$X_diff <- plm::lag(pTestData$X) - pTestData$X
fdmod <- plm(Y_diff ~ X_diff, data = pTestData, model = "pooling")
length(residuals(fdmod)) # 10
nrow(fdmod$model)        # 10

当提到lag函数时,我明确地使用了plm::,因为其他几个软件包也具有lag函数(最著名的是statsdplyr),您想在这里使用软件包plm中的一个. 要将残差增加到差异数据(实际上用于计算模型),只需执行以下操作: dat <- cbind(fdmod$model, residuals(fdmod))

I explicity used plm:: when referring to the lag function as several other packages have a lag function as well (most notably stats and dplyr) and you want to use the one from package plm here. To augment the residuals to the differenced data (actually used for computing the model), just do something like: dat <- cbind(fdmod$model, residuals(fdmod))

此外,您可能对函数is.pconsecutive感兴趣 检查数据的保密性:

Also, you might be interested in the function is.pconsecutive to check for consectutiveness of your data:

is.pconsecutive(pTestData)
#    1     2     3 
# TRUE  TRUE FALSE 

功能make.pconsecutive将通过在缺失期间插入具有NA值的行来使数据连续.

Function make.pconsecutive will make your data consecutive by inserting rows with NA values for the missing period.

这篇关于来自不平衡面板上的一阶差分回归的残差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆