汇总每个观察值是否可以属于多个组 [英] Aggregating if each observation can belong to multiple groups

查看:82
本文介绍了汇总每个观察值是否可以属于多个组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按组汇总日期。但是,每个观察值可以属于几个组(例如,观察值1属于A和B组)。我找不到使用 data.table 的好方法。目前,我为每个可能的组创建了一个逻辑变量,如果观察值属于该组,则该逻辑变量的值将为 TRUE 。我正在寻找一种比下面介绍的方法更好的方法。我也想知道如何使用 tidyverse 实现这一目标。

I want to aggregate Date by group. However, each observation can belong to several groups (e.g. observation 1 belongs to group A and B). I could not find a nice way to achieve this with data.table. Currently I created for each of the possible groups a logical variable which takes the value TRUE if the observation belongs to that group. I am looking for a better way to do this than presented below. I would also like to know how I could achieve this with the tidyverse.

library(data.table)
# Data
set.seed(1)
TF <- c(TRUE, FALSE)
time <- rep(1:4, each = 5)
df <- data.table(time = time, x = rnorm(20), groupA = sample(TF, size = 20, replace = TRUE),
                                             groupB = sample(TF, size = 20, replace = TRUE),
                                             groupC = sample(TF, size = 20, replace = TRUE))

# This should be nicer and less repetitive
df[groupA == TRUE, .(A = sum(x)), by = time][
  df[groupB == TRUE, .(B = sum(x)), by = time], on = "time"][
    df[groupC == TRUE, .(C = sum(x)), by = time], on = "time"]

# desired output
time          A          B         C
1:    1         NA  0.9432955 0.1331984
2:    2  1.2257538  0.2427420 0.1882493
3:    3 -0.1992284 -0.1992284 1.9016244
4:    4  0.5327774  0.9438362 0.9276459


推荐答案

data.table

df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time]
# > df[, lapply(.SD[, .(groupA, groupB, groupC)]*x, sum), time]
#    time     groupA     groupB    groupC
# 1:    1  0.0000000  0.9432955 0.1331984
# 2:    2  1.2257538  0.2427420 0.1882493
# 3:    3 -0.1992284 -0.1992284 1.9016244
# 4:    4  0.5327774  0.9438362 0.9276459

或(以@ thonsoon12表示评论)以编程方式进行:

or (thx to @chinsoon12 for the comment) more programmatically:

df[, lapply(.SD*x, sum), by=.(time), .SDcols=paste0("group", c("A","B","C"))]

如果您想要长格式的结果,则可以执行以下操作:

If you want the result in the long format you can do:

df[, colSums(.SD*x), by=.(time), .SDcols=paste0("group", c("A","B","C"))]
### with indicator for the group:
df[, .(colSums(.SD*x), c("A","B","C")), by=.(time), .SDcols=paste0("group", c("A","B","C"))] 

这篇关于汇总每个观察值是否可以属于多个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆