时间间隔不均匀的小组的总和 [英] Rolling sums for groups with uneven time gaps

查看:104
本文介绍了时间间隔不均匀的小组的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我以前发布的中的调整问题。这是我的数据:

Here's the tweak to my previously posted question. Here's my data:

set.seed(3737)
DF2 = data.frame(user_id = c(rep(27, 7), rep(11, 7)),
            date = as.Date(rep(c('2016-01-01', '2016-01-03', '2016-01-05', '2016-01-07', '2016-01-10', '2016-01-14', '2016-01-16'), 2)),
            value = round(rnorm(14, 15, 5), 1))

 user_id  date        value
 27       2016-01-01  15.0
 27       2016-01-03  22.4
 27       2016-01-05  13.3
 27       2016-01-07  21.9
 27       2016-01-10  20.6
 27       2016-01-14  18.6
 27       2016-01-16  16.4
 11       2016-01-01   6.8
 11       2016-01-03  21.3
 11       2016-01-05  19.8
 11       2016-01-07  22.0
 11       2016-01-10  19.4
 11       2016-01-14  17.5
 11       2016-01-16  19.3

这次,我想计算指定时间段内每个 user_id 的累积总和';例如最近7、14天。理想的解决方案如下所示:

This time, I'd like to calculate cumulative sum of a value for each user_id for the specified time period'; e.g. last 7, 14 days. The desirable solution would look like this:

 user_id  date        value    v_minus7 v_minus14
 27       2016-01-01  15.0     15.0      15.0
 27       2016-01-03  22.4     37.4      37.4
 27       2016-01-05  13.3     50.7      50.7
 27       2016-01-07  21.9     72.6      72.6
 27       2016-01-10  20.6     78.2      93.2
 27       2016-01-14  18.6     61.1     111.8
 27       2016-01-16  16.4     55.6     113.2
 11       2016-01-01   6.8      6.8       6.8
 11       2016-01-03  21.3     28.1      28.1
 11       2016-01-05  19.8     47.9      47.9
 11       2016-01-07  22.0     69.9      69.9
 11       2016-01-10  19.4     82.5      89.3
 11       2016-01-14  17.5     58.9     106.8
 11       2016-01-16  19.3     56.2     119.3

我想为此使用 dplyr ,但其他软件包也可以。

Ideally, I'd like to use dplyr for this, but other packages would be fine.

推荐答案

logic:按 user_id 分组的第一组,然后按 date 组。现在,对于每个数据子集,我们使用 between()返回一个逻辑向量,来检查哪些日期位于当前日期和7/14天之间。

logic : first group by user_id, followed by date. Now for each subset of data, we are checking which all dates lie between the current date and 7/14 days back using between() which returns a logical vector.

基于此逻辑向量,我添加了

Based on this logical vector I add the value column

library(data.table)
setDT(DF2)[, `:=`(v_minus7 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-7, date, incbounds = TRUE)]), 
                 v_minus14 = sum(DF2$value[DF2$user_id == user_id][between(DF2$date[DF2$user_id == user_id], date-14, date, incbounds = TRUE)])),
           by = c("user_id", "date")][]
 #   user_id       date value v_minus7 v_minus14
 #1:      27 2016-01-01  15.0     15.0      15.0
 #2:      27 2016-01-03  22.4     37.4      37.4
 #3:      27 2016-01-05  13.3     50.7      50.7
 #4:      27 2016-01-07  21.9     72.6      72.6
 #5:      27 2016-01-10  20.6     78.2      93.2
 #6:      27 2016-01-14  18.6     61.1     111.8
 #7:      27 2016-01-16  16.4     55.6     113.2
 #8:      11 2016-01-01   6.8      6.8       6.8
 #9:      11 2016-01-03  21.3     28.1      28.1
#10:      11 2016-01-05  19.8     47.9      47.9
#11:      11 2016-01-07  22.0     69.9      69.9
#12:      11 2016-01-10  19.4     82.5      89.3
#13:      11 2016-01-14  17.5     58.9     106.8
#14:      11 2016-01-16  19.3     56.2     119.3

$来自alexis_laz答案的b
$ b





# from alexis_laz answer.
ff = function(date, value, minus){
  cs = cumsum(value)  
  i = findInterval(date - minus, date, rightmost.closed = TRUE) 
  w = which(as.logical(i))
  i[w] = cs[i[w]]
  cs - i
} 
setDT(DF2)
DF2[, `:=`( v_minus7 = ff(date, value, 7), 
            v_minus14 = ff(date, value, 14)), by = c("user_id")]

这篇关于时间间隔不均匀的小组的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆