R data.table:表中所有现有组合的总和 [英] R data.table: Sum by all existing combinations in table
问题描述
我有一个data.table out
这样(实际上它大得多):
I have a data.table out
like this (in reality it is much larger):
out <- code weights group
1: 2 0.387 1
2: 1 0.399 1
3: 2 1.610 1
4: 3 1.323 2
5: 2 0.373 2
6: 1 0.212 2
7: 3 0.316 3
8: 2 0.569 3
9: 1 0.120 3
10: 1 0.354 3
它有3组不同的代码(第1列)。在组#1中,代码3不出现,而在另一个出现。
It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.
然后,我想对每个组和代码组合的权重求和。我用这个命令实现这个:
Then, I want to sum the weights for every group and code combination . I achieve this with this command:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]
将组1与代码3组合,因为它不在 out
表中。我想在 sum.dt
中有所有可能的组合,如果组合不出现在源表中,它应该总计为0,意味着<$
This works well but it does not have the combination Group 1 with Code 3 because it is not in the out
table. I would like to have all possible combinations in sum.dt
, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1
should be 0 in this row.
任何想法如何实现这一点?
Any idea how I could achieve this?
推荐答案
使用 CJ
(交叉连接)可以添加缺少的组合:
Using CJ
(cross join) you can add the missing combinations:
library(data.table)
setkey(out, code, group)
out[CJ(code,group,unique=TRUE)
][, lapply(.SD, sum), by=.(code,group)
][is.na(weights), weights := 0]
提供:
code group weights
1: 1 1 0.399
2: 1 2 0.212
3: 1 3 0.474
4: 2 1 1.997
5: 2 2 0.373
6: 2 3 0.569
7: 3 1 0.000
8: 3 2 1.323
9: 3 3 0.316
或与 xtabs
一样,@alexis_laz在注释中显示:
Or with xtabs
as @alexis_laz showed in the comments:
xtabs(weights ~ group + code, out)
$ b b
其中:
which gives:
code
group 1 2 3
1 0.399 1.997 0.000
2 0.212 0.373 1.323
3 0.474 0.569 0.316
要在长形数据框架中获得此输出,可以将 xtabs
代码包含在熔化
em> reshape2 (或 data.table )包:
If you want to get this output in a long-form dataframe, you can wrap the xtabs
code in the melt
function of the reshape2 (or data.table) package:
library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))
其给出:
> class(res)
[1] "data.frame"
> res
group code value
1 1 1 0.399
2 2 1 0.212
3 3 1 0.474
4 1 2 1.997
5 2 2 0.373
6 3 2 0.569
7 1 3 0.000
8 2 3 1.323
9 3 3 0.316
您也可以使用 dplyr 和 tidyr 的组合:
library(dplyr)
library(tidyr)
out %>%
complete(code, group, fill = list(weights=0)) %>%
group_by(code, group) %>%
summarise(sum(weights))
这篇关于R data.table:表中所有现有组合的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!