子集一个不平衡的面板数据集在 R 中至少有 2 个连续观察 [英] Subsetting a unbalanced panel dataset to have at least 2 consecutive observations in R
问题描述
我在 R 中有一个不平衡的面板数据集.以下将作为示例:
I have an unbalanced panel dataset in R. The following will serve as an example:
dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)),
year=c(2001:2003,2000,2002,2000:2001,2003))
> dt
name year
1 A 2001
2 A 2002
3 A 2003
4 B 2000
5 B 2002
6 C 2000
7 C 2001
8 C 2003
现在,我需要对每个 name
进行至少 2 个连续的 year
观察.因此,我想删除第 4、5 和 8 行.我在 R 中如何最好地做到这一点?
Now, I need to have at least 2 consecutive year
observations for each name
. Hence, I would like to remove row 4, 5, and 8. How do I best do that in R?
感谢下面的评论,我可以说得更清楚一些.如果我有一个带有 name
=C
和 year
=2004
的额外观察(第 9 行),我会想要将第 8 行和第 9 行与其他行一起保留.
Thanks to the comment below, I can make a bit clearer. If I had an extra observation (row 9) with name
=C
and year
=2004
, I would want to keep both row 8 and 9 along with the others.
推荐答案
我的(hackish)方法是:
My (hackish) way to do it would be:
is.consecutive = duplicated(rbind(dt,transform(dt, year=year+1),
transform(dt, year=year-1)),
fromLast=TRUE)[1:nrow(dt)]
is.consecutive
包含要保留的观察值的布尔值向量.对于您的示例,此向量将是:TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
is.consecutive
contains a vector of booleans of the observations to be retained. For your example, this vector would be: TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
最后,您可以轻松地使用此向量对您的 data.frame 进行子集化,例如与:
Finally, you can easily use this vector to subset your data.frame, e.g. with:
dt[is.consecutive,]
这篇关于子集一个不平衡的面板数据集在 R 中至少有 2 个连续观察的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!