如何在R中找到平衡的面板数据(也就是,如何在给定窗口中找到面板中的哪些条目是完整的) [英] How to find balanced panel data in R (aka, how to find which entries in panel are complete over given window)
问题描述
我有大量来自Compustat的数据.我向其中添加了一些手工收集的数据(严重地是从一堆旧书中手工收集的).但是我不想手工收集整个面板,而只是手工收集一个随机选择的子集.为了找到更大的集合(我从中随机选择),我想从Compustat的平衡面板开始.
I have a big panel of data from Compustat. To it I am adding some hand-collected data (seriously hand-collected from a stack of old books). But I don't want to hand-collect for the entire panel, only a randomly selected subset. To find the larger set (from which I'm randomly selecting) I would like to start with the balanced panel from Compustat.
我看到了plm
库,用于处理不平衡的面板,但我想使其保持平衡.有没有一种干净的方法来做到这一点,而不是寻找并淘汰不在样本期内的公司(小组讨论中的个人)?谢谢!
I see the plm
library for working with unbalanced panels, but I would like to keep it balanced. Is there a clean way to do this short of searching for and throwing out firms (individuals in panelspeak) that don't run the sample period? Thanks!
推荐答案
经过一番思考,有一种更简便的方法.
After a second thought, there is a much easier way for doing this.
看看这个:
data.with.only.complete.subjects.data <- function(xx, subject.column, number.of.observation.a.subject.should.have)
{
subjects <- xx[,subject.column]
num.of.observations.per.subject <- table(subjects)
subjects.to.keep <- names(num.of.observations.per.subject)[num.of.observations.per.subject == number.of.observation.a.subject.should.have]
subset.by.me <- subjects %in% subjects.to.keep
new.xx <- xx[subset.by.me ,]
return(new.xx)
}
xx <- data.frame(subject = rep(1:4, each = 3),
observation.per.subject = rep(rep(1:3), 4))
xx.mis <- xx[-c(2,5),]
data.with.only.complete.subjects.data(xx.mis , 1, 3)
这篇关于如何在R中找到平衡的面板数据(也就是,如何在给定窗口中找到面板中的哪些条目是完整的)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!