data.table:按表中所有现有组合进行求和 [英] data.table: Sum by all existing combinations in table

查看:151
本文介绍了data.table:按表中所有现有组合进行求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table out 这样(实际上它大得多):

  out < -  code weights group 
1:2 0.387 1
2:1 0.399 1
3:2 1.610 1
4:3 1.323 2
5:2 0.373 2
6:1 0.212 2
7:3 0.316 3
8:2 0.569 3
9:1 0.120 3
10:1 0.354 3

它有3组不同的代码(第1列)。在#1组中,代码3没有出现,而在另一个出现。



然后,我想对每个组和代码组合的权重进行求和。我用这个命令实现了这个功能:

  sum.dt < -  out [,。(sum(weights)),by = list(code,group)] [order(-V1)] 

因为它不在 out 表中,所以将组1与代码3组合在一起。我希望在 sum.dt 中具有所有可能的组合,并且如果组合不在源表中发生,它应该总计为0,这意味着列 V1 在这一行应该是0。



任何想法我可以达到这个目的吗?
使用 CJ (交叉连接),您可以添加缺失的组合:

  library(data.table)
setkey(out,code,group)
out [CJ(code,group,unique = TRUE )
] [,lapply(.SD,sum),by =。(code,group)
] [is.na(weights),weights:= 0]


给出:

 代码组权重
1:1 1 0.399
2:1 2 0.212
3:1 3 0.474
4:2 1 1.997
5:2 2 0.373
6 :2 3 0.569
7:3 1 0.000
8:3 2 1.323
9:3 3 0.316






或与 xtabs as @alexis_laz在评论中显示:

  xtabs(权重〜组+代码,出)

给出:

  code 
group 1 2 3
1 0.399 1.997 0.000
2 0.212 0.373 1.323
3 0.474 0.569 0.316
code>

如果你想以长形式获得这个输出,你可以将 xtabs 代码包装在< code $> reshape2 (或 data.table )包的功能:

  library(reshape2)
res < - melt(xtabs(weights〜group + code,out))

给出:

 > class(res)
[1]data.frame
> res
组码值
1 1 1 0.399
2 2 1 0.212
3 3 1 0.474
4 1 2 1.997
5 2 2 0.373
6 3 2 0.569
7 1 3 0.000
8 2 3 1.323
9 3 3 0.316






您也可以使用 dplyr tidyr 组合:

  library(dplyr)
library(tidyr)
out%>%
complete(code, (总和(权重))
$ >


I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:    1   0.399      1
        3:    2   1.610      1
        4:    3   1.323      2
        5:    2   0.373      2                                            
        6:    1   0.212      2
        7:    3   0.316      3
        8:    2   0.569      3
        9:    1   0.120      3
       10:    1   0.354      3

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

Any idea how I could achieve this?

解决方案

Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code,group,unique=TRUE)
    ][, lapply(.SD, sum), by=.(code,group)
      ][is.na(weights), weights := 0]

gives:

   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316


Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

which gives:

     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))

which gives:

> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316


You could also do this with a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))

这篇关于data.table:按表中所有现有组合进行求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆