在不平衡的面板数据集中生成每个ID先前所有观测值的滚动平均值 [英] Producing a rolling average of ALL the previous observations per ID in an unbalanced panel data set

查看:147
本文介绍了在不平衡的面板数据集中生成每个ID先前所有观测值的滚动平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算不平衡数据集的滚动平均值.为了说明我的观点,我制作了这个玩具示例数据:

I am trying to compute rolling means of an unbalanced data set. To illustrate my point I have produced this toy example of my data:

ID  year  Var   RollingAvg(Var)
1   2000  2     NA
1   2001  3     2
1   2002  4     2.5
1   2003  2     3
2   2001  2     NA
2   2002  5     2
2   2003  4     3.5

RollingAvg(Var)是我想要的,但无法获取.换句话说,我正在寻找每个ID的所有Var先前观测值的滚动平均值.我尝试在zooplyr包中使用rollapplyddply,但是我看不到如何设置滚动窗口长度以对每个ID使用所有以前的观察结果.也许我应该改用plm软件包?感谢您的帮助.

The column RollingAvg(Var) is what I want, but can't get. In words, I am looking for the rolling average of ALL the previous observations of Var for each ID. I have tried using rollapply and ddply in the zoo and the plyr package, but I can't see how to set the rolling window length to use ALL the previous observations for each ID. Maybe I should use the plm package instead? Any help is appreciated.

我在BALANCED面板数据集上看到了其他有关滚动平均值的文章,但我似乎无法将他们的答案推断为不平衡数据.

I have seen other posts on rolling means on BALANCED panel data set, but I can't seem to extrapolate their answers to unbalanced data.

谢谢

M

推荐答案

使用data.table:

library(data.table)
d = data.table(your_df)

d[, RollingAvg := {avg = cumsum(Var)/seq_len(.N);
                   c(NA, avg[-length(avg)])},
    by = ID]

(甚至简化)

d[, RollingAvg := c(NA, head(cumsum(Var)/(seq_len(.N)), -1)), by = ID]

这篇关于在不平衡的面板数据集中生成每个ID先前所有观测值的滚动平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆