data.table:表中所有现有组合的总和 [英] data.table: Sum by all existing combinations in table
问题描述
我有一个 data.table out
像这样(实际上它要大得多):
out <- 代码权重组1:2 0.387 12:1 0.399 13:2 1.610 14:3 1.323 25:2 0.373 26:1 0.212 27: 3 0.316 38:2 0.569 39:1 0.120 310: 1 0.354 3
它有 3 个不同代码的组(第 1 列).在#1组中,代码3没有出现,而在另一组中出现.
然后,我想对每个组和代码组合的权重求和.我用这个命令实现了这一点:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]
这很有效,但它没有将组 1 与代码 3 组合在一起,因为它不在 out
表中.我想在 sum.dt
中包含所有可能的组合,如果该组合没有出现在源表中,则总和应为 0,即 V1
列此行中应为 0.
知道如何实现这一目标吗?
使用CJ
(交叉连接)可以添加缺失的组合:
library(data.table)setkey(输出,代码,组)出[CJ(代码,组,唯一=真)][, lapply(.SD, sum), by = .(code, group)][is.na(权重),权重:= 0]
给出:
<块引用> 代码组权重1: 1 1 0.3992:1 2 0.2123:1 3 0.4744: 2 1 1.9975: 2 2 0.3736: 2 3 0.5697: 3 1 0.0008: 3 2 1.3239: 3 3 0.316
<小时>
或者使用 xtabs
如@alexis_laz 在评论中所示:
xtabs(weights ~ group + code, out)
给出:
<块引用> 代码第 1 组 2 31 0.399 1.997 0.0002 0.212 0.373 1.3233 0.474 0.569 0.316
如果你想在一个长格式的数据帧中得到这个输出,你可以将 xtabs
代码包装在 reshape2melt 函数中>(或data.table)包:
库(reshape2)res <-melt(xtabs(weights ~ group + code, out))
给出:
<块引用>>类(资源)[1]数据框">资源组码值1 1 1 0.3992 2 1 0.2123 3 1 0.4744 1 2 1.9975 2 2 0.3736 3 2 0.5697 1 3 0.0008 2 3 1.3239 3 3 0.316
<小时>
您也可以使用 dplyr 和 tidyr 的组合来做到这一点:
库(dplyr)图书馆(整理)%>%完成(代码,组,填充=列表(权重=0))%>%group_by(代码,组)%>%总结(总和(权重))
I have a data.table out
like this (in reality it is much larger):
out <- code weights group
1: 2 0.387 1
2: 1 0.399 1
3: 2 1.610 1
4: 3 1.323 2
5: 2 0.373 2
6: 1 0.212 2
7: 3 0.316 3
8: 2 0.569 3
9: 1 0.120 3
10: 1 0.354 3
It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.
Then, I want to sum the weights for every group and code combination . I achieve this with this command:
sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]
This works well but it does not have the combination Group 1 with Code 3 because it is not in the out
table. I would like to have all possible combinations in sum.dt
, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1
should be 0 in this row.
Any idea how I could achieve this?
Using CJ
(cross join) you can add the missing combinations:
library(data.table)
setkey(out, code, group)
out[CJ(code, group, unique = TRUE)
][, lapply(.SD, sum), by = .(code, group)
][is.na(weights), weights := 0]
gives:
code group weights 1: 1 1 0.399 2: 1 2 0.212 3: 1 3 0.474 4: 2 1 1.997 5: 2 2 0.373 6: 2 3 0.569 7: 3 1 0.000 8: 3 2 1.323 9: 3 3 0.316
Or with xtabs
as @alexis_laz showed in the comments:
xtabs(weights ~ group + code, out)
which gives:
code group 1 2 3 1 0.399 1.997 0.000 2 0.212 0.373 1.323 3 0.474 0.569 0.316
If you want to get this output in a long-form dataframe, you can wrap the xtabs
code in the melt
function of the reshape2 (or data.table) package:
library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))
which gives:
> class(res) [1] "data.frame" > res group code value 1 1 1 0.399 2 2 1 0.212 3 3 1 0.474 4 1 2 1.997 5 2 2 0.373 6 3 2 0.569 7 1 3 0.000 8 2 3 1.323 9 3 3 0.316
You could also do this with a combination of dplyr and tidyr:
library(dplyr)
library(tidyr)
out %>%
complete(code, group, fill = list(weights=0)) %>%
group_by(code, group) %>%
summarise(sum(weights))
这篇关于data.table:表中所有现有组合的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!