根据不同的列子集并汇总原始data.table [英] Subset and aggregate an original data.table based on a different column

查看：69 发布时间：2020/10/15 20:10:12 r data.table

本文介绍了根据不同的列子集并汇总原始data.table的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是令人惊讶的困难，但是我正在尝试按照标题中的说明进行操作，例如，假设我有一个数据表 dat ，并且我正在尝试计算累计和在第二列中出现的任何组的新列中（从第一列和第三列开始，在第二列中出现）。

This is surprisingly difficult, but I am trying to do what the title says, for example suppose I have a data table dat and I am trying to calculate the cumulative sum in a new column (from the 1st and 3rd, when it appears in the 2nd) of whatever group appears in the second column.

dat = data.table(A=c(1,2,3,1,4,5,1,2,3),B=c(1,1,1,NA,1,NA,2,NA,2),C=c(1,12,24.2,251,2,1,2,3,-1))
dat[,cumsum:=0]

所以数据看起来像

   > dat
   A  B     C
1: 1  1   1.0
2: 2  1  12.0
3: 3  1  24.2
4: 1 NA 251.0
5: 4  1   2.0
6: 5 NA   1.0
7: 1  2   2.0
8: 2 NA   3.0
9: 3  2  -1.0

我希望输出为：

> dat
   A  B     C cumsum
1: 1  1   1.0      1
2: 2  1  12.0      1
3: 3  1  24.2      1
4: 1 NA 251.0      0
5: 4  1   2.0      252
6: 5 NA   1.0      0
7: 1  2   2.0      12
8: 2 NA   3.0      0
9: 3  2  -1.0      15

是否存在有效的数据表方法？我可以使用循环来执行此操作，但是这样做会很慢，而且我觉得这必须以更可扩展的方式来实现，但我被困住了。

Is there an efficient data table way to do this? I could do this with loops but this would be quite slow, and I feel this must be doable in a more scalable way but I'm stuck.

推荐答案

使用非等额自连接的一种可能方法：

A possible approach to use non equi self join:

dat[, rn := .I]
dat[!is.na(B), cumsum := dat[.SD, on=.(A=B, rn<=rn), sum(x.C), by=.EACHI]$V1]

输出：

   A  B     C cumsum rn
1: 1  1   1.0      1  1
2: 2  1  12.0      1  2
3: 3  1  24.2      1  3
4: 1 NA 251.0      0  4
5: 4  1   2.0    252  5
6: 5 NA   1.0      0  6
7: 1  2   2.0     12  7
8: 2 NA   3.0      0  8
9: 3  2  -1.0     15  9

数据：

dat = data.table(A=c(1,2,3,1,4,5,1,2,3),B=c(1,1,1,NA,1,NA,2,NA,2),C=c(1,12,24.2,251,2,1,2,3,-1))
dat[,cumsum:=0]

编辑：添加另一种受弗兰克答案启发的方法

edit: adding another approach inspired by Frank's answer

dat = data.table(A=c(1,2,3,1,4,5,1,2,3),B=c(1,1,1,NA,1,NA,2,NA,2),C=c(1,12,24.2,251,2,1,2,3,-1))
dat[, rn := .I][, cs := cumsum(C), A]
dat[, cumsum := 0][
    !is.na(B), cumsum :=  dat[.SD, on=.(A=B, rn), allow.cartesian=TRUE, roll=TRUE, x.cs]]

这篇关于根据不同的列子集并汇总原始data.table的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据不同的列子集并汇总原始data.table [英] Subset and aggregate an original data.table based on a different column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据不同的列子集并汇总原始data.table [英] Subset and aggregate an original data.table based on a different column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭