具有固定日期的卷 [英] rollsum with fixed dates

查看:148
本文介绍了具有固定日期的卷的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

I have a data frame that looks like this:

user_id        date             price
2375 2012/12/12 00:00:00.000  47.900000
2375 2013/01/16 00:00:00.000  47.900000
2375 2013/01/16 00:00:00.000  47.900000
2375 2013/05/08 00:00:00.000  47.900000
2375 2013/06/01 00:00:00.000  47.900000
2375 2013/10/02 00:00:00.000  26.500000
2375 2014/01/22 00:00:00.000  47.900000
2375 2014/03/21 00:00:00.000  47.900000
2375 2014/05/24 00:00:00.000  47.900000
2375 2015/04/11 00:00:00.000  47.900000
7419 2012/12/12 00:00:00.000   7.174977
7419 2013/01/02 00:00:00.000  27.500000
7419 2013/01/18 00:00:00.000  22.901482
7419 2013/02/08 00:00:00.000  27.500000
7419 2013/03/06 00:00:00.000   8.200000
7419 2013/04/03 00:00:00.000  22.901482
7419 2013/04/03 00:00:00.000   8.200000
7419 2013/04/03 00:00:00.000   6.900000
7419 2013/04/17 00:00:00.000   7.500000
7419 2013/04/17 00:00:00.000   7.500000
7419 2013/05/23 00:00:00.000   7.500000
7419 2013/06/07 00:00:00.000  27.500000
7419 2013/06/07 00:00:00.000   7.500000
7419 2013/06/07 00:00:00.000   7.500000
7419 2013/06/07 00:00:00.000   5.829188
7419 2013/07/10 00:00:00.000  27.500000
7419 2013/08/21 00:00:00.000   7.500000
7419 2013/08/21 00:00:00.000  27.500000
7419 2013/09/06 00:00:00.000  27.500000
7419 2013/12/27 00:00:00.000   7.500000
7419 2014/01/10 00:00:00.000  27.500000
7419 2014/02/16 00:00:00.000  27.500000
7419 2014/05/14 00:00:00.000  41.900000
7419 2014/07/03 00:00:00.000  26.500000
7419 2014/09/26 00:00:00.000  26.500000
7419 2014/09/26 00:00:00.000   7.500000
7419 2014/10/22 00:00:00.000  27.500000
7419 2014/11/15 00:00:00.000   6.900000
7419 2014/11/27 00:00:00.000  26.500000
7419 2014/12/12 00:00:00.000  40.900000
7419 2015/01/14 00:00:00.000  27.200000
7419 2015/02/24 00:00:00.000  26.500000
7419 2015/03/17 00:00:00.000  40.900000
7419 2015/05/02 00:00:00.000  27.200000
7419 2015/05/02 00:00:00.000  26.500000
7419 2015/05/15 00:00:00.000   7.900000
7419 2015/05/20 00:00:00.000  27.500000
7419 2015/06/20 00:00:00.000   7.500000
7419 2015/06/26 00:00:00.000   7.500000
7419 2015/06/30 00:00:00.000  41.900000
7419 2015/07/16 00:00:00.000  78.500000
11860 2012/12/12 00:00:00.000   7.174977
11860 2012/12/12 00:00:00.000  21.500000
11860 2013/03/02 00:00:00.000  22.901482
11860 2013/03/02 00:00:00.000   8.200000
11860 2013/05/25 00:00:00.000  29.500000
11860 2013/05/25 00:00:00.000   7.500000

实际上,我有超过40000个user_id。我想计算每个用户的价格的前4周(不计算当前周)的总和。但是,日期是固定的,从12/12/2012到22/09/2015。为了避免每个用户的循环,我想到了一些类似于

In reality, I have more than 40000 user_id. I want to calculate the sum of the previous 4 weeks (not counting the present week) of the price for each user. However, the date period is fixed, from 12/12/2012 to 22/09/2015. In order to avoid a loop for each user, I thought of something like

df <- df %>% group_by(user_id) %>%
    mutate(price.lag1 = lag(prod_price, n = 1)) %>%
    mutate(amount4weeks = rollsum(x=price, 4, align = "right", fill = NA))

但是,它给我一个错误,它只会作为日期数据中存在的行。

However, it gives me an error, and it will only take as "date" the rows present in the data.

如何给出卷轴的具体日期和/或如何在单行中执行所需的操作?我的结果应该如下:

How can I give rollsum specific dates and/or how can I do what I want in a one-liner? My result should look like:

df$price4weeks = c(NA, 0.000000, 0.000000, 0.000000, 47.900000, 0.000000,  0.000000, 0.000000,  0.000000,  0.000000, NA, 7.174977, 27.500000, 22.901482, 27.500000,  8.200000,  8.200000,  8.200000,  6.900000,  6.900000,  0.000000, 7.500000,  7.500000,  7.500000,  7.500000,  0.000000,  0.000000,  0.000000, 27.500000,  0.000000,  7.500000,  0.000000,  0.000000,  0.000000,  0.000000, 0.000000,  7.500000, 27.500000,  6.900000, 33.400000,  0.000000,  0.000000, 26.500000,  0.000000,  0.000000, 26.500000, 34.400000, 27.500000,  7.500000,15.000000, 56.900000, NA, NA, 0.000000, 0.000000, 0.000000, 0.000000)

如果我在我的解释中缺少某些东西,请告诉我。

Let me know if I am missing something in my explanation.

谢谢!

推荐答案

计算滚动k个数据点的总和。要在几周内使用 dplyr ,您可以在数据中添加 week_number 列,然后使用 sapply over week_number 。代码可能如下所示:

rollsum calculates the sum over a rolling k number of data points. To use dplyr with weeks, you could add a week_number column to your data and then calculate the rolling sum using sapply over week_number . The code could look like:

df <- mutate(df, week_number=cut.POSIXt(df$date, breaks="week", labels=FALSE))
df_new <- df %>% group_by(user_id) %>%
      do(mutate(.,total_4wk=sapply(week_number, function(n) sum(.$price[between(.$week_number, n -4, n-1)],na.rm=TRUE))))

这篇关于具有固定日期的卷的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆