R data.table 条件聚合 [英] R data.table conditional aggregation

查看:28
本文介绍了R data.table 条件聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我面临(我认为)是 data.table 上的聚合的棘手问题我有以下 data.table

I'm faced with (what I think) is a tough problem with aggregations on data.table I've the following data.table

structure(list(id1 = c("a", "a", "a", "b", "b", "c", "c"), id2 = c("x", 
"y", "z", "x", "u", "y", "z"), val = c(2, 1, 2, 1, 3, 4, 3)), .Names = c("id1", 
"id2", "val"), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x1f66a78>)

我想根据第二列 id2val 列上为此数据创建条件聚合.完成聚合的方式是仅包含 id1 组,这些组至少具有来自给定 id2 元素的一个元素.我将通过一个示例来说明我的意思.

I would like to create conditional aggregates on the val column for this data based on the second column id2. The way the aggregation is done is to only include id1 groups which have at least one element from a given id2 element. I'll step through an example to show what I mean.

x(第一行第 2 列)的条件聚合将包括 val 值 2,1,2 用于 id1 = aval values = 1,3 from id1 = b 因为存在 id2=x 但没有来自 id1=c,结果为 2 + 1 + 2 + 1 + 3 = 9.我希望 9 作为 id2 = x 出现的每一行的第 4 列.

The conditional aggregate for x (the first row 2nd column) would include val values 2,1,2 for id1 = a and val values = 1,3 from id1 = b because id2=x exists for them but no values from id1=c, resulting in a value of 2 + 1 + 2 + 1 + 3 = 9. I want the 9 as a 4th column in every row where id2 = x appears.

同样,我想对所有 id2 值执行此操作.所以最终的输出是

Likewise, I want to do this for all id2 values. So the final output would be

    id1 id2 val c.sum
1:   a   x   2     9
2:   a   y   1    12
3:   a   z   2    12
4:   b   x   1     9
5:   b   u   3     4
6:   c   y   4    12
7:   c   z   3    14

这在 R,data.table 中可能吗?或任何其他包/方法?提前致谢

Is this possible in R, data.table? Or any other package/method? Thanks in advance

推荐答案

鉴于 d 是您的输入结构:

Given that d is your input structure:

library(data.table)

d[,c.sum:=sum(d$val[d$id1 %in% id1]),by=id2][]

工作原理:by=id2 将输入数据表did2 分组;d$id1 %in% id1 选择 did1 与所考虑组的 id1 匹配的行;sum(d$val[...]) 从这些行中获取值的总和;最后,c.sum:=sum(...)c.sum 列添加到 d.结尾 [] 仅用于打印目的.

How it works: by=id2 groups input data table d by id2; d$id1 %in% id1 selects rows in d whose id1 matches id1 of the group under consideration; sum(d$val[...]) takes sum of values from such rows; finally, c.sum:=sum(...) adds a column c.sum to d. The ending [] are needed only for the printing purpose.

输出是:

#    id1 id2 val c.sum
# 1:   a   x   2     9
# 2:   a   y   1    12
# 3:   a   z   2    12
# 4:   b   x   1     9
# 5:   b   u   3     4
# 6:   c   y   4    12
# 7:   c   z   3    12

这篇关于R data.table 条件聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆