R：使用分组计算过去和未来特定事件的发生次数 [英] R: calculate the number of occurrences of a specific event in the past AND future with groupings

查看：140 发布时间：2017/3/12 11:11:19 r group-by data.table dplyr

本文介绍了R：使用分组计算过去和未来特定事件的发生次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题是一个问题的修改，我发布这里，其中我在不同的天出现了特定类型，但这次他们被分配给多个用户，例如：

this question is a modification of a problem I posted here where I have occurrences of a specific type on different days, but this time they are assigned to multiple users, for example:

df = data.frame(user_id = c(rep(1:2, each=5)),
            cancelled_order = c(rep(c(0,1,1,0,0), 2)),
            order_date = as.Date(c('2015-01-28', '2015-01-31', '2015-02-08', '2015-02-23',  '2015-03-23',
                                   '2015-01-25', '2015-01-28', '2015-02-06', '2015-02-21',  '2015-03-26')))


user_id cancelled_order order_date
      1               0 2015-01-28
      1               1 2015-01-31
      1               1 2015-02-08
      1               0 2015-02-23
      1               0 2015-03-23
      2               0 2015-01-25
      2               1 2015-01-28
      2               1 2015-02-06
      2               0 2015-02-21
      2               0 2015-03-26

我想计算

1）每个客户取消订单的数量将在接下来的x天内有（例如7，14），排除当前和

1) the number of cancelled orders that each customer is going to have in the next x days (e.g. 7, 14), excluding the current one and

1）每个客户在过去x天（例如7,14），不包括当前的

1) the number of cancelled orders that each customer had in the past x days (e.g. 7, 14) , excluding the current one.

所需的输出如下所示：

solution user_id cancelled_order order_date plus14 minus14 1 0 2015-01-28 2 0 1 1 2015-01-31 1 0 1 1 2015-02-08 0 1 1 0 2015-02-23 0 0 1 0 2015-03-23 0 0 2 0 2015-01-25 2 0 2 1 2015-01-28 1 0 2 1 2015-02-06 0 1 2 0 2015-02-21 0 0 2 0 2015-03-26 0 0

完全适合此目的的解决方案由@ joel.wilson使用 data.table

The solution that is perfectly fit for this purpose was presented by @joel.wilson using data.table

library(data.table) vec <- c(14, 30) # Specify desired ranges setDT(df)[, paste0("x", vec) := lapply(vec, function(i) sum(df$cancelled_order[between(df$order_date, order_date, order_date + i, # this part can be changed to reflect the past date ranges incbounds = FALSE)])), by = order_date]

但是，它不考虑按 user_id 分组。当我试图通过添加 by = c（user_id，order_date）或 by = list（user_id， order_date），它没有工作。它似乎是一些非常基本的，任何提示如何绕过这个细节？

However, it does not take into account grouping by user_id. When I tried to modify the formula by adding this grouping as by = c("user_id", "order_date") or by = list(user_id, order_date), it did not work. It seems it is something very basic, any hints on how to get around this detail?

此外，请记住，我在一个解决方案后工作，即使它不是基于上述代码或 data.table
Also, keep in mind that I'm after a solution that works, even if it is not based on the above code or data.table at all! 谢谢！推荐答案 p>这里有一种方法： Here's one way: library(data.table) orderDT = with(df, data.table(id = user_id, completed = !cancelled_order, d = order_date)) vec = list(minus = 14L, plus = 14L) orderDT[, c("dplus", "dminus") := .( orderDT[!(completed)][orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI]$N , orderDT[!(completed)][orderDT[, .(id, d_minus = d - vec$minus, d_yest = d - 1L)], on=.(id, d >= d_minus, d <= d_yest), .N, by=.EACHI]$N )] id completed d dplus dminus 1: 1 TRUE 2015-01-28 2 0 2: 1 FALSE 2015-01-31 1 0 3: 1 FALSE 2015-02-08 0 1 4: 1 TRUE 2015-02-23 0 0 5: 1 TRUE 2015-03-23 0 0 6: 2 TRUE 2015-01-25 2 0 7: 2 FALSE 2015-01-28 1 0 8: 2 FALSE 2015-02-06 0 1 9: 2 TRUE 2015-02-21 0 0 10: 2 TRUE 2015-03-26 0 0 （我发现OP的列名称繁琐，他们。） (I found OP's column names cumbersome and so shortened them.) 工作原理每个列都可以自己运行，如 Each of the columns can be run on its own, like orderDT[!(completed)][orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI]$N 这可以通过简化来细分为： And this can be broken down into steps by simplifying: orderDT[!(completed)][ orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI]$N # original version orderDT[!(completed)][ orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI] # don't extract the N column of counts orderDT[!(completed)][ orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom)] # don't create the N column of counts orderDT[!(completed)] # don't do the join orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)] # see the second table used in the join 这使用非等值连接，采用不等式来定义日期范围。有关更多详细信息，请参阅通过键入？data.table 找到的文档页。 This uses a "non-equi" join, taking inequalities to define the date ranges. For more details, see the documentation page found by typing ?data.table. 这篇关于R：使用分组计算过去和未来特定事件的发生次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：使用分组计算过去和未来特定事件的发生次数 [英] R: calculate the number of occurrences of a specific event in the past AND future with groupings

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：使用分组计算过去和未来特定事件的发生次数 [英] R: calculate the number of occurrences of a specific event in the past AND future with groupings

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭