R data.table:表中所有现有组合的总和 [英] R data.table: Sum by all existing combinations in table

查看:95
本文介绍了R data.table:表中所有现有组合的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table out 这样(实际上它大得多):

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:    1   0.399      1
        3:    2   1.610      1
        4:    3   1.323      2
        5:    2   0.373      2                                            
        6:    1   0.212      2
        7:    3   0.316      3
        8:    2   0.569      3
        9:    1   0.120      3
       10:    1   0.354      3

它有3组不同的代码(第1列)。在组#1中,代码3不出现,而在另一个出现。

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

然后,我想对每个组和代码组合的权重求和。我用这个命令实现这个:

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

 sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

将组1与代码3组合,因为它不在 out 表中。我想在 sum.dt 中有所有可能的组合,如果组合不出现在源表中,它应该总计为0,意味着<$

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

任何想法如何实现这一点?

Any idea how I could achieve this?

推荐答案

使用 CJ (交叉连接)可以添加缺少的组合:

Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code,group,unique=TRUE)
    ][, lapply(.SD, sum), by=.(code,group)
      ][is.na(weights), weights := 0]

提供:

   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316






或与 xtabs 一样,@alexis_laz在注释中显示:


Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

$ b b

其中:

which gives:

     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316

要在长形数据框架中获得此输出,可以将 xtabs 代码包含在熔化 em> reshape2 (或 data.table )包:

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))

其给出:

> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316






您也可以使用 dplyr tidyr 的组合:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))

这篇关于R data.table:表中所有现有组合的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆