data.table生成多列并汇总 [英] data.table generate multiple columns and summarize them
问题描述
我正在尝试学习data.table语法。我掌握了大多数简单汇总的基础知识,但我没有掌握如何使用data.table从现有列中生成新列并进行汇总。
I am trying to learn data.table syntax. I have most of the basics of simple summarizations but I am not getting how to use data.table to generate new columns from an existing column and summarize.
这里是一个MWE示例在这里,我使用dplyr和基础工具通过对变量进行分组来从一个或多个汇总列:
Here's a MWE example where I use dplyr and base tools to make multiple columns from one and thn summarize by grouping variables:
当前输入
## fact1 fact2 X0
## 1 b 2 9
## 2 a 2 6
## 3 b 1 7
## 4 c 2 3
## 5 a 1 8
## 6 a 1 4
## 7 a 1 5
## 8 a 1 1
## 9 b 1 2
## 10 b 2 10
基本+ dlyr代码
set.seed(10)
dat <- data.frame(
fact1 = factor(sample(c('a', 'b', 'c'), 10, TRUE)),
fact2 = factor(sample(1:2, 10, TRUE)),
X0 = sample(1:10, 10)
)
add <- function(x, y) x + y
z <- sample(1:10, 6, FALSE)
library(dplyr)
z %>%
lapply(., add, dat[, 'X0']) %>%
do.call(cbind, .) %>%
cbind(dat, .) %>%
data.frame() %>%
group_by(fact1, fact2) %>%
summarise_each(funs(sum))
所需的输出
## Source: local data frame [5 x 9]
## Groups: fact1
##
## fact1 fact2 X0 X1 X2 X3 X4 X5 X6
## 1 a 1 18 42 22 26 46 30 34
## 2 a 2 6 12 7 8 13 9 10
## 3 b 1 9 21 11 13 23 15 17
## 4 b 2 19 31 21 23 33 25 27
## 5 c 2 3 9 4 5 10 6 7
当我要一个data.table特定解决方案时认为看到基础和dplyr等解决方案很聪明,可能会使这个问题吸引更广泛的读者。
While I'm asking for a data.table specific solution I think seeing base and dplyr etc. solutions that are clever may make this question appeal to a broader reader.
推荐答案
可能会更好方式
library(data.table)
setDT(dat)[, paste0("X", 1:6):= lapply(z, add, X0),
][, lapply(.SD, sum), by = .(fact1, fact2)]
# fact1 fact2 X0 X1 X2 X3 X4 X5 X6
# 1: b 2 19 31 21 23 33 25 27
# 2: a 2 6 12 7 8 13 9 10
# 3: b 1 9 21 11 13 23 15 17
# 4: c 2 3 9 4 5 10 6 7
# 5: a 1 18 42 22 26 46 30 34
这篇关于data.table生成多列并汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!