通过 id 变量计算滚动总和,缺少时间点 [英] Compute rolling sum by id variables, with missing timepoints

查看:31
本文介绍了通过 id 变量计算滚动总和,缺少时间点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习 R,并且我在 SAS 中做了 10 多年的一些事情,但我无法完全弄清楚在 R 中做的最佳方法.拿这些数据:

I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data:

 id  class           t  count  desired
 --  -----  ----------  -----  -------
  1      A  2010-01-15      1        1
  1      A  2010-02-15      2        3
  1      B  2010-04-15      3        3
  1      B  2010-09-15      4        4
  2      A  2010-01-15      5        5
  2      B  2010-06-15      6        6
  2      B  2010-08-15      7       13
  2      B  2010-09-15      8       21

我想在 4 个月的滚动窗口内按 id、类和滚动总和计算所需的列.请注意,并不是每个 id 和 class 组合都存在所有月份.

I want to calculate the column desired as a rolling sum by id, class, and within a 4 months rolling window. Notice that not all months are present for each combination of id and class.

在 SAS 中,我通常会采用以下两种方式之一:

In SAS I'd typically do this in one of 2 ways:

  1. RETAIN 加上一个 by id &班级.
  2. PROC SQL 在适当的窗口中,在 id、class 和 df1.d-df2.d 上从 df as df1 到 df as df2 进行左连接
  1. RETAIN plus a by id & class.
  2. PROC SQL with a left join from df as df1 to df as df2 on id, class and the df1.d-df2.d within the appropriate window

解决此类问题的最佳 R 方法是什么?

What is the best R approach to this type of problem?

t <- as.Date(c("2010-01-15","2010-02-15","2010-04-15","2010-09-15",
               "2010-01-15","2010-06-15","2010-08-15","2010-09-15"))
class <- c("A","A","B","B","A","B","B","B")
id <- c(1,1,1,1,2,2,2,2)
count <- seq(1,8,length.out=8)
desired <- c(1,3,3,4,5,6,13,21)
df <- data.frame(id,class,t,count,desired)

推荐答案

我几乎不好意思发布这个.我通常和这些一样好,但必须有更好的方法.

I'm almost embarrassed to post this. I'm usually pretty good as these, but there's got to be a better way.

这首先使用 zooas.yearmon 来获取月份和年份的日期,然后对其进行整形以获取每个 的一列id/class 组合,然后在之前、之后和缺失月份用零填充,然后使用 zoo 得到滚动总和,然后取出刚刚所需的月份并与原始数据框合并.

This first uses zoo's as.yearmon to get the dates in terms of just month and year, then reshapes it to get one column for each id/class combination, then fills in with zeros before, after, and for missing months, then uses zoo to get the rolling sum, then pulls out just the desired months and merges back with the original data frame.

library(reshape2)
library(zoo)
df$yearmon <- as.yearmon(df$t)
dfa <- dcast(id + class ~ yearmon, data=df, value.var="count")
ida <- dfa[,1:2]
dfa <- t(as.matrix(dfa[,-c(1:2)]))
months <- with(df, seq(min(yearmon)-3/12, max(yearmon)+3/12, by=1/12))
dfb <- array(dim=c(length(months), ncol(dfa)), 
             dimnames=list(paste(months), colnames(dfa)))
dfb[rownames(dfa),] <- dfa
dfb[is.na(dfb)] <- 0
dfb <- rollsumr(dfb,4, fill=0)
rownames(dfb) <- paste(months)
dfb <- dfb[rownames(dfa),]
dfc <- cbind(ida, t(dfb))
dfc <- melt(dfc, id.vars=c("class", "id"))
names(dfc)[3:4] <- c("yearmon", "desired2")
dfc$yearmon <- as.yearmon(dfc$yearmon)
out <- merge(df,dfc)

> out
  id class  yearmon          t count desired desired2
1  1     A Feb 2010 2010-02-15     2       3        3
2  1     A Jan 2010 2010-01-15     1       1        1
3  1     B Apr 2010 2010-04-15     3       3        3
4  1     B Sep 2010 2010-09-15     4       4        4
5  2     A Jan 2010 2010-01-15     5       5        5
6  2     B Aug 2010 2010-08-15     7      13       13
7  2     B Jun 2010 2010-06-15     6       6        6
8  2     B Sep 2010 2010-09-15     8      21       21

这篇关于通过 id 变量计算滚动总和,缺少时间点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆