在R中的不平衡面板数据中创建滞后变量 [英] Create lagged variable in unbalanced panel data in R
问题描述
我想创建一个变量,其中包含组内前一年的变量值。
id日期值
1 1 1992 4.1
2 1 NA 4.5
3 1 1991 3.3
4 1 1990 5.3
5 1 1994 3.0
6 2 1992 3.2
7 2 1991 5.2
value_lagged
应在组内缺少上一年时丢失 - 因为它是组内的第一个日期如在行4,7中),或者因为在数据中存在年差距(如在行5中)。此外,当缺少当前时间(如第2行中所示)时,应缺少 value_lagged
。
id日期值value_lagged
1 1 1992 4.1 3.3
2 1 NA 4.5 NA
3 1 1991 3.3 5.3
4 1 1990 5.3 NA
5 1 1994 3.0 NA
6 2 1992 3.2 5.2
7 2 1991 5.2 NA
现在,在R中,我使用 data.table
package
DT = data.table(id = c(1,1,1,1,1) ,2,2),
date = c(1992,NA,1991,1990,1994,1992,1991),
value = c(4.1,4.5,3.3,5.3,3.0,3.2,5.2 )
)
setkey(DT,id,date)
DT [,value_lagged:= DT [J(id,date-1),value],]
DT [ .na(date),value_lagged:= NA,]
我。我想知道是否有更好的选择使用 data.table
, dplyr
或任何其他包。非常感谢!
在 Stata
tsset id date
pre>
gen value_lagged = L.value
解决方案在组内使用函数
定义tlag
由id
tlag < n = 1L,time){
index < - match(time-n,time,incomparable = NA)
x [index]
}
df%> ;%group_by(id)%>%mutate(value_lagged = tlag(value,1,time = date))
I'd like to create a variable containing the value of a variable in the previous year within a group.
id date value 1 1 1992 4.1 2 1 NA 4.5 3 1 1991 3.3 4 1 1990 5.3 5 1 1994 3.0 6 2 1992 3.2 7 2 1991 5.2
value_lagged
should be missing when the previous year is missing within a group - either because it is the first date within a group (as in row 4, 7), or because there are year gaps in the data (as in row 5). Also,value_lagged
should be missing when the current time is missing (as in row 2).This gives:
id date value value_lagged 1 1 1992 4.1 3.3 2 1 NA 4.5 NA 3 1 1991 3.3 5.3 4 1 1990 5.3 NA 5 1 1994 3.0 NA 6 2 1992 3.2 5.2 7 2 1991 5.2 NA
For now, in R, I use the
data.table
packageDT = data.table(id = c(1,1,1,1,1,2,2), date = c(1992,NA,1991,1990,1994,1992,1991), value = c(4.1,4.5,3.3,5.3,3.0,3.2,5.2) ) setkey(DT, id, date) DT[, value_lagged := DT[J(id, date-1), value], ] DT[is.na(date), value_lagged := NA, ]
It's fast but it seems somewhat error prone to me. I'd like to know if there are better alternatives using
data.table
,dplyr
, or any other package. Thanks a lot!
In
Stata
, one would do:tsset id date gen value_lagged=L.value
解决方案Using a function
tlag
within groups defined byid
tlag <- function(x, n = 1L, time) { index <- match(time - n, time, incomparables = NA) x[index] } df %>% group_by(id) %>% mutate(value_lagged = tlag(value, 1, time = date))
这篇关于在R中的不平衡面板数据中创建滞后变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!