根据另一列中的值创建新的 r data.table 列并进行分组 [英] Creating a new r data.table column based on values in another column and grouping

查看:15
本文介绍了根据另一列中的值创建新的 r data.table 列并进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含日期、邮政编码和购买金额的 data.table.

I have a data.table with date, zipcode and purchase amounts.

library(data.table)
set.seed(88)
DT <- data.table(date = Sys.Date()-365 + sort(sample(1:100, 10)), 
zip = sample(c("2000", "1150", "3000"),10, replace = TRUE), 
purchaseAmount = sample(1:20, 10))  

这将创建以下内容:

    date       zip              purchaseAmount
 1: 2016-01-08 1150              5
 2: 2016-01-15 3000             15
 3: 2016-02-15 1150             16
 4: 2016-02-20 2000             18
 5: 2016-03-07 2000             19
 6: 2016-03-15 2000             11
 7: 2016-03-17 2000              6
 8: 2016-04-02 1150             17
 9: 2016-04-08 3000              7
10: 2016-04-09 3000             20

我想添加第四列earlierPurchases.此列应sum purchaseAmount 中前一个 x datezipcode 中的所有值代码>.

I would like to add a fourth column earlierPurchases. This column should sum all the values in purchaseAmount for the previous x date within the zipcode.

根据 Frank 的建议,这是预期的输出:

As per suggestion from Frank, here is the expected output:

          date  zip purchaseAmount new_col
 1: 2016-01-08 1150              5       5
 2: 2016-01-15 3000             15      15
 3: 2016-02-15 1150             16      16
 4: 2016-02-20 2000             18      18
 5: 2016-03-07 2000             19      19
 6: 2016-03-15 2000             11      30
 7: 2016-03-17 2000              6      36
 8: 2016-04-02 1150             17      17
 9: 2016-04-08 3000              7       7
10: 2016-04-09 3000             20      27

有没有 data.table 方法可以做到这一点,还是我应该只写一个循环 function?

Is there a data.table way to do this, or should I just write a looping function?

推荐答案

这似乎可行:

DT[, new_col := 
  DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
    sum(purchaseAmount)
  , by=.EACHI ]$V1
]


          date  zip purchaseAmount new_col
 1: 2016-01-08 1150              5       5
 2: 2016-01-15 3000             15      15
 3: 2016-02-15 1150             16      16
 4: 2016-02-20 2000             18      18
 5: 2016-03-07 2000             19      19
 6: 2016-03-15 2000             11      30
 7: 2016-03-17 2000              6      36
 8: 2016-04-02 1150             17      17
 9: 2016-04-08 3000              7       7
10: 2016-04-09 3000             20      27

这使用非等"连接,有效地获取每一行;在每一行的 on= 表达式中查找符合我们条件的所有行;然后按行求和 (by=.EACHI).在这种情况下,非等值连接的效率可能低于某些滚动求和方法.

This uses a "non-equi" join, effectively taking each row; finding all rows that meet our criteria in the on= expression for each row; and then summing by row (by=.EACHI). In this case, a non-equi join is probably less efficient than some rolling-sum approach.

它是如何工作的.

要将列添加到 data.table,通常的语法是 DT[, new_col := expression].在这里,表达式实际上甚至在 DT[...] 之外也有效.尝试自己运行它:

To add columns to a data.table, the usual syntax is DT[, new_col := expression]. Here, the expression actually works even outside of the DT[...]. Try running it on its own:

DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
  sum(purchaseAmount)
, by=.EACHI ]$V1

您可以逐步简化它,直到它只是连接...

You can progressively simplify this until it's just the join...

DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1), 
  sum(purchaseAmount)
, by=.EACHI ]
# note that V1 is the default name for computed columns

DT[.(zip = zip, d0 = date - 10, d1 = date), on=.(zip, date >= d0, date <= d1)]
# now we're down to just the join

连接语法类似于 x[i, on=.(xcol = icol, xcol2 <icol2)],如您键入 ?data 时打开的文档页面中所述.table 进入加载了 data.table 包的 R 控制台.

The join syntax is like x[i, on=.(xcol = icol, xcol2 < icol2)], as documented in the doc page that opens when you type ?data.table into an R console with the data.table package loaded.

要开始使用 data.table,我建议查看小插曲.之后,这可能看起来更清晰.

To get started with data.table, I'd suggest reviewing the vignettes. After that, this'll probably look a lot more legible.

这篇关于根据另一列中的值创建新的 r data.table 列并进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆