相对窗口运行和通过data.table非等值连接 [英] relative windowed running sum through data.table non-equi join

查看：129 发布时间：2017/3/12 12:30:38 r data.table summarization

本文介绍了相对窗口运行和通过data.table非等值连接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集customerId，transactionDate，productId，purchaseQty加载到一个data.table中。对于每一行，我要计算前45天的总收入和平均销售量。

I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the prior 45 day

        productId customerID transactionDate purchaseQty
 1:    870826    1186951      2016-03-28      162000
 2:    870826    1244216      2016-03-31        5000
 3:    870826    1244216      2016-04-08        6500
 4:    870826    1308671      2016-03-28      221367
 5:    870826    1308671      2016-03-29       83633
 6:    870826    1308671      2016-11-29       60500

我正在寻找这样的输出：

I'm looking for an output like this:

    productId customerID transactionDate purchaseQty    sumWindowPurchases
 1:    870826    1186951      2016-03-28      162000                162000
 2:    870826    1244216      2016-03-31        5000                  5000
 3:    870826    1244216      2016-04-08        6500                 11500
 4:    870826    1308671      2016-03-28      221367                221367
 5:    870826    1308671      2016-03-29       83633                305000
 6:    870826    1308671      2016-11-29       60500                 60500

因此，sumWindowPurchases包含从当前交易日期起45天窗口内客户/产品的purchaseQty总和。一旦我有这个工作，抛出的平均值和其他计算我需要的应该是微不足道的。

so, sumWindowPurchases contains the sum of purchaseQty for the customer/product over a 45 day window from the current transaction date. Once i have that working, throwing the mean, and other calcs I need should be trivial

我回到我的SQL根，并想到一个自连接：

I went back to my SQL roots and thought of a self join:

select   DT.customerId, DT.transactionDate, DT.productId, sum(DT1.purchaseQty)
from     DT
         inner join DT as DT1 on 
             DT.customerId = DT1.customerId
             and DT.productId =  DT1.productId
             and DT1.transactionDate between DT.transactionDate and dateadd(day, -45, DT.transactionDate)

尝试使用data.dable语法将其翻译成R，我希望做某事像这样：

Trying to translate that into R using data.dable syntax, I was hoping to do something like this:

DT1 <- DT #alias. have confirmed this is just a pointer
DT[DT1[DT1$transactionDate >= DT$transactionDate - 45],
   .(sum(DT1$purchaseQty)), 
   by = .(DT$customerId , DT$transactionDate ), 
   on = .(customerId , DT1$transactionDate <= DT$TransactionDate), 
   allow.cartesian = TRUE]

我想我有一个2部分的问题。什么是R方式做到这一点。是一个data.table自联接正确的方法，或者woudl我更好地尝试使用Reduce函数？

I guess I have a 2 part question. What is the "R way" to do this. Is a data.table self join the correct approach, or woudl i be better of trying to use the Reduce function?

我怀疑自连接是唯一的方法滚动45天窗口在那里。所以第二部分是我需要一些帮助data.table语法来显式引用列来自哪个源表，因为它的自联接，他们有相同的列名。

I suspect the self join is the only way to get the rolling 45 day window in there. so part 2 is I need some help with the data.table syntax to explicitly reference which source table the column comes from, since its a self join and they have the same column names.

我一直在研究弗兰克与这个表达式有关的答案。

Ive been studying the answers that Frank linked to and have come up with this expression

DT[.(p = productId, c = customerID, t = transactionDate, start = transactionDate - 45),
        on = .(productId==p, customerID==c, transactionDate<=t, transactionDate>=start),
        allow.cartesian = TRUE, nomatch = 0]

生成此输出：

   productId customerID transactionDate purchaseQty transactionDate.1
1:    870826    1186951      2016-03-28      162000        2016-02-12
2:    870826    1244216      2016-03-31        5000        2016-02-15
3:    870826    1244216      2016-04-08        5000        2016-02-23
4:    870826    1244216      2016-04-08        6500        2016-02-23
5:    870826    1308671      2016-03-28      221367        2016-02-12
6:    870826    1308671      2016-03-29      221367        2016-02-13
7:    870826    1308671      2016-03-29       83633        2016-02-13
8:    870826    1308671      2016-11-29       60500        2016-10-15

这是非常接近，我需要得到我的最后一步。如果我可以总结这个输出的购买数量，按客户/产品/ transactionDate.1分组，我会有一些有用的。然而，我不能得到的语法下来，不是我理解transactionDate.1名字是从哪里来的

This is very close, to what i need to get to my final step. if i could sum the purchase quantities of this output, group by customer/product/transactionDate.1, i would have something useful. however, I cant get the syntax down for that, not do I understand where the transactionDate.1 name is coming from

相对窗口运行和通过data.table非等值连接 [英] relative windowed running sum through data.table non-equi join

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

相对窗口运行和通过data.table非等值连接 [英] relative windowed running sum through data.table non-equi join

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭