R数据表条件聚合 [英] R data.table conditional aggregation

查看:180
本文介绍了R数据表条件聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我面对(我认为)是 data.table
上的聚合的一个棘手问题我有以下 data.table

I'm faced with (what I think) is a tough problem with aggregations on data.table I've the following data.table

structure(list(id1 = c("a", "a", "a", "b", "b", "c", "c"), id2 = c("x", 
"y", "z", "x", "u", "y", "z"), val = c(2, 1, 2, 1, 3, 4, 3)), .Names = c("id1", 
"id2", "val"), row.names = c(NA, -7L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x1f66a78>)

我想在 val 列创建条件聚合此数据基于第二列 id2 。聚合的方法是只包括具有给定 id2 元素中至少一个元素的 id1 组。

I would like to create conditional aggregates on the val column for this data based on the second column id2. The way the aggregation is done is to only include id1 groups which have at least one element from a given id2 element. I'll step through an example to show what I mean.

x 的条件聚合(第一行第二列)将包括 val 值2,1,2, id1 = a val values = 1,3 from id1 = b ,因为 id2 = x 没有来自 id1 = c 的值,导致值为2 + 1 + 2 + 1 + 3 = 9。我希望9作为每一行的第4列, code> id2 = x 出现。

The conditional aggregate for x (the first row 2nd column) would include val values 2,1,2 for id1 = a and val values = 1,3 from id1 = b because id2=x exists for them but no values from id1=c, resulting in a value of 2 + 1 + 2 + 1 + 3 = 9. I want the 9 as a 4th column in every row where id2 = x appears.

同样,我想为所有 id2 值。因此,最终输出为

Likewise, I want to do this for all id2 values. So the final output would be

    id1 id2 val c.sum
1:   a   x   2     9
2:   a   y   1    12
3:   a   z   2    12
4:   b   x   1     9
5:   b   u   3     4
6:   c   y   4    12
7:   c   z   3    14

这是否可能在R,data.table?或任何其他包/方法?
提前感谢

Is this possible in R, data.table? Or any other package/method? Thanks in advance

推荐答案

由于 d 输入结构:

library(data.table)

d[,c.sum:=sum(d$val[d$id1 %in% id1]),by=id2][]

它的工作原理: by = id2 组输入数据表 d id2 ; d $ id1%in%id1 d 中选择 id1 匹配正在考虑的组中的 id1 ; sum(d $ val [...])取这些行的值的总和;最后, c.sum:= sum(...)向<$ c $添加了一列 c.sum c> d

How it works: by=id2 groups input data table d by id2; d$id1 %in% id1 selects rows in d whose id1 matches id1 of the group under consideration; sum(d$val[...]) takes sum of values from such rows; finally, c.sum:=sum(...) adds a column c.sum to d. The ending [] are needed only for the printing purpose.

输出结果是:

#    id1 id2 val c.sum
# 1:   a   x   2     9
# 2:   a   y   1    12
# 3:   a   z   2    12
# 4:   b   x   1     9
# 5:   b   u   3     4
# 6:   c   y   4    12
# 7:   c   z   3    12

这篇关于R数据表条件聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆