data.table:按表中所有现有组合进行求和 [英] data.table: Sum by all existing combinations in table
问题描述
我有一个data.table out
这样(实际上它大得多):
out < - code weights group
1:2 0.387 1
2:1 0.399 1
3:2 1.610 1
4:3 1.323 2
5:2 0.373 2
6:1 0.212 2
7:3 0.316 3
8:2 0.569 3
9:1 0.120 3
10:1 0.354 3
它有3组不同的代码(第1列)。在#1组中,代码3没有出现,而在另一个出现。
然后,我想对每个组和代码组合的权重进行求和。我用这个命令实现了这个功能:
sum.dt < - out [,。(sum(weights)),by = list(code,group)] [order(-V1)]
因为它不在 out
表中,所以将组1与代码3组合在一起。我希望在 sum.dt
中具有所有可能的组合,并且如果组合不在源表中发生,它应该总计为0,这意味着列 V1
在这一行应该是0。
任何想法我可以达到这个目的吗?
使用 CJ
(交叉连接),您可以添加缺失的组合:
library(data.table)
setkey(out,code,group)
out [CJ(code,group,unique = TRUE )
] [,lapply(.SD,sum),by =。(code,group)
] [is.na(weights),weights:= 0]
给出:
代码组权重
1:1 1 0.399
2:1 2 0.212
3:1 3 0.474
4:2 1 1.997
5:2 2 0.373
6 :2 3 0.569
7:3 1 0.000
8:3 2 1.323
9:3 3 0.316
或与
xtabs
as @alexis_laz在评论中显示:
xtabs(权重〜组+代码,出)
给出:
code
group 1 2 3
1 0.399 1.997 0.000
2 0.212 0.373 1.323
3 0.474 0.569 0.316
code>
如果你想以长形式获得这个输出,你可以将
xtabs
代码包装在< code $> reshape2 (或 data.table )包的功能:library(reshape2)
res < - melt(xtabs(weights〜group + code,out))
给出:
> class(res)
[1]data.frame
> res
组码值
1 1 1 0.399
2 2 1 0.212
3 3 1 0.474
4 1 2 1.997
5 2 2 0.373
6 3 2 0.569
7 1 3 0.000
8 2 3 1.323
9 3 3 0.316
您也可以使用 dplyr 和 tidyr 组合:
library(dplyr)
$ >
library(tidyr)
out%>%
complete(code, (总和(权重))
I have a data.table
out
like this (in reality it is much larger):out <- code weights group 1: 2 0.387 1 2: 1 0.399 1 3: 2 1.610 1 4: 3 1.323 2 5: 2 0.373 2 6: 1 0.212 2 7: 3 0.316 3 8: 2 0.569 3 9: 1 0.120 3 10: 1 0.354 3
It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.
Then, I want to sum the weights for every group and code combination . I achieve this with this command:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]
This works well but it does not have the combination Group 1 with Code 3 because it is not in the
out
table. I would like to have all possible combinations insum.dt
, and if the combination does not occur in the source table, it should sum up to 0, meaning the columnV1
should be 0 in this row.Any idea how I could achieve this?
解决方案Using
CJ
(cross join) you can add the missing combinations:library(data.table) setkey(out, code, group) out[CJ(code,group,unique=TRUE) ][, lapply(.SD, sum), by=.(code,group) ][is.na(weights), weights := 0]
gives:
code group weights 1: 1 1 0.399 2: 1 2 0.212 3: 1 3 0.474 4: 2 1 1.997 5: 2 2 0.373 6: 2 3 0.569 7: 3 1 0.000 8: 3 2 1.323 9: 3 3 0.316
Or with
xtabs
as @alexis_laz showed in the comments:xtabs(weights ~ group + code, out)
which gives:
code group 1 2 3 1 0.399 1.997 0.000 2 0.212 0.373 1.323 3 0.474 0.569 0.316
If you want to get this output in a long-form dataframe, you can wrap the
xtabs
code in themelt
function of the reshape2 (or data.table) package:library(reshape2) res <- melt(xtabs(weights ~ group + code, out))
which gives:
> class(res) [1] "data.frame" > res group code value 1 1 1 0.399 2 2 1 0.212 3 3 1 0.474 4 1 2 1.997 5 2 2 0.373 6 3 2 0.569 7 1 3 0.000 8 2 3 1.323 9 3 3 0.316
You could also do this with a combination of dplyr and tidyr:
library(dplyr) library(tidyr) out %>% complete(code, group, fill = list(weights=0)) %>% group_by(code, group) %>% summarise(sum(weights))
这篇关于data.table:按表中所有现有组合进行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!