在R中生成移动总和变量 [英] Generating a moving sum variable in R

查看:56
本文介绍了在R中生成移动总和变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怀疑这是一个具有多个解决方案的简单问题,但是我对R还是有点新手,详尽的搜索并没有得出与我想做的事情相吻合的答案。



由于缺乏更好的用语,我试图为数据框中的变量创建移动总和。这将是3年和5年的总和,滞后一年。因此,1986年一个观测值的5年总和就是1981、1982、1983、1984和1985年以前所有观测值的总和。这是我想做的一个例子,其中总和变量是观察年之前五年中所有 x 的总和。

 国家x x5yrsum 
A 1980 9 NA
A 1981 3 NA
A 1982 5 NA
A 1983 6 NA
A 1984 9 NA
A 1985 7 32
A 1986 9 30
A 1987 4 36

.....................

B 1990 0不适用
B 1991 4不适用
B 1992 2不适用
B 1993 6不适用
B 1994 3不适用
B 1995 7 15
B 1996 0 22

这是不平衡的面板数据。我怀疑 ddply 是合适的,但我不知道确切的编码。



任何输入都会

解决方案

您可以在<$ c $中使用过滤器 c> ddply (或任何其他实现 split-apply-combine方法的函数):

 库(plyr)
ddply(DF,。(国家),变换,
x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides = 1) ))

#国家年份x x5yrsum x5yrsum2
#1 A 1980 9 NA NA
#2 A 1981 3 NA NA
#3 A 1982 5 NA NA
#4 A 1983 6不适用不适用
#5 A 1984 9不适用不适用
#6 A 1985 7 32 32
#7 A 1986 9 30 30
#8 A 1987 4 36 36
#9 B 1990 0不适用不适用
#10 B 1991 4不适用不适用
#11 B 1992 2不适用不适用
#12 B 1993 6 NA NA
#13 B 1994 3 NA NA
#14 B 1995 7 15 15
#15 B 1996 0 22 22


I suspect this is a somewhat simple question with multiple solutions, but I'm still a bit of a novice in R and an exhaustive search didn't yield answers that spoke well to what I'm wanting to do.

I'm trying to create, for lack of better term, "moving sums" for a variable in my data frame. These would be 3-year and 5-year sums, lagged one year. So, a 5-year sum for an observation in 1986 would be the sum of all previous observations in 1981, 1982, 1983, 1984, and 1985. Here is an example of what I would like to do, where the sum variable is the sum of all x in the five years prior to the observation year.

country     year      x      x5yrsum
  A         1980      9        NA
  A         1981      3        NA
  A         1982      5        NA
  A         1983      6        NA
  A         1984      9        NA
  A         1985      7        32
  A         1986      9        30
  A         1987      4        36

  .....................

  B         1990      0        NA
  B         1991      4        NA
  B         1992      2        NA
  B         1993      6        NA
  B         1994      3        NA
  B         1995      7        15
  B         1996      0        22

This is unbalanced panel data. I suspect ddply would be appropriate, but I wouldn't know the exact coding for it.

Any input would be appreciated.

解决方案

You can use filter in ddply (or any other function implementing the "split-apply-combine" approach):

library(plyr)
ddply(DF, .(country), transform, 
          x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))

#    country year x x5yrsum x5yrsum2
# 1        A 1980 9      NA       NA
# 2        A 1981 3      NA       NA
# 3        A 1982 5      NA       NA
# 4        A 1983 6      NA       NA
# 5        A 1984 9      NA       NA
# 6        A 1985 7      32       32
# 7        A 1986 9      30       30
# 8        A 1987 4      36       36
# 9        B 1990 0      NA       NA
# 10       B 1991 4      NA       NA
# 11       B 1992 2      NA       NA
# 12       B 1993 6      NA       NA
# 13       B 1994 3      NA       NA
# 14       B 1995 7      15       15
# 15       B 1996 0      22       22

这篇关于在R中生成移动总和变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆