R data.table 条件聚合 [英] R data.table conditional aggregation
问题描述
我面临(我认为)是 data.table
上的聚合的棘手问题我有以下 data.table
I'm faced with (what I think) is a tough problem with aggregations on data.table
I've the following data.table
structure(list(id1 = c("a", "a", "a", "b", "b", "c", "c"), id2 = c("x",
"y", "z", "x", "u", "y", "z"), val = c(2, 1, 2, 1, 3, 4, 3)), .Names = c("id1",
"id2", "val"), row.names = c(NA, -7L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x1f66a78>)
我想根据第二列 id2
在 val
列上为此数据创建条件聚合.完成聚合的方式是仅包含 id1
组,这些组至少具有来自给定 id2
元素的一个元素.我将通过一个示例来说明我的意思.
I would like to create conditional aggregates on the val
column for this data based on the second column id2
. The way the aggregation is done is to only include id1
groups which have at least one element from a given id2
element. I'll step through an example to show what I mean.
x
(第一行第 2 列)的条件聚合将包括 val
值 2,1,2 用于 id1 = a
和val
values = 1,3 from id1 = b
因为存在 id2=x
但没有来自 id1=c的值code>,结果为 2 + 1 + 2 + 1 + 3 = 9.我希望 9 作为
id2 = x
出现的每一行的第 4 列.
The conditional aggregate for x
(the first row 2nd column) would include val
values 2,1,2 for id1 = a
and val
values = 1,3 from id1 = b
because id2=x
exists for them but no values from id1=c
, resulting in a value of 2 + 1 + 2 + 1 + 3 = 9. I want the 9 as a 4th column in every row where id2 = x
appears.
同样,我想对所有 id2
值执行此操作.所以最终的输出是
Likewise, I want to do this for all id2
values. So the final output would be
id1 id2 val c.sum
1: a x 2 9
2: a y 1 12
3: a z 2 12
4: b x 1 9
5: b u 3 4
6: c y 4 12
7: c z 3 14
这在 R,data.table 中可能吗?或任何其他包/方法?提前致谢
Is this possible in R, data.table? Or any other package/method? Thanks in advance
推荐答案
鉴于 d
是您的输入结构:
Given that d
is your input structure:
library(data.table)
d[,c.sum:=sum(d$val[d$id1 %in% id1]),by=id2][]
工作原理:by=id2
将输入数据表d
按id2
分组;d$id1 %in% id1
选择 d
中 id1
与所考虑组的 id1
匹配的行;sum(d$val[...])
从这些行中获取值的总和;最后,c.sum:=sum(...)
将 c.sum
列添加到 d
.结尾 []
仅用于打印目的.
How it works: by=id2
groups input data table d
by id2
; d$id1 %in% id1
selects rows in d
whose id1
matches id1
of the group under consideration; sum(d$val[...])
takes sum of values from such rows; finally, c.sum:=sum(...)
adds a column c.sum
to d
. The ending []
are needed only for the printing purpose.
输出是:
# id1 id2 val c.sum
# 1: a x 2 9
# 2: a y 1 12
# 3: a z 2 12
# 4: b x 1 9
# 5: b u 3 4
# 6: c y 4 12
# 7: c z 3 12
这篇关于R data.table 条件聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!