data.table:表中所有现有组合的总和 [英] data.table: Sum by all existing combinations in table

查看:18
本文介绍了data.table:表中所有现有组合的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.table out 像这样(实际上它要大得多):

out <- 代码权重组1:2 0.387 12:1 0.399 13:2 1.610 14:3 1.323 25:2 0.373 26:1 0.212 27: 3 0.316 38:2 0.569 39:1 0.120 310: 1 0.354 3

它有 3 个不同代码的组(第 1 列).在#1组中,代码3没有出现,而在另一组中出现.

然后,我想对每个组和代码组合的权重求和.我用这个命令实现了这一点:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

这很有效,但它没有将组 1 与代码 3 组合在一起,因为它不在 out 表中.我想在 sum.dt 中包含所有可能的组合,如果该组合没有出现在源表中,则总和应为 0,即 V1 列此行中应为 0.

知道如何实现这一目标吗?

解决方案

使用CJ(交叉连接)可以添加缺失的组合:

library(data.table)setkey(输出,代码,组)出[CJ(代码,组,唯一=真)][, lapply(.SD, sum), by = .(code, group)][is.na(权重),权重:= 0]

给出:

<块引用>

 代码组权重1: 1 1 0.3992:1 2 0.2123:1 3 0.4744: 2 1 1.9975: 2 2 0.3736: 2 3 0.5697: 3 1 0.0008: 3 2 1.3239: 3 3 0.316

<小时>

或者使用 xtabs 如@alexis_laz 在评论中所示:

xtabs(weights ~ group + code, out)

给出:

<块引用>

 代码第 1 组 2 31 0.399 1.997 0.0002 0.212 0.373 1.3233 0.474 0.569 0.316

如果你想在一个长格式的数据帧中得到这个输出,你可以将 xtabs 代码包装在 reshape2melt 函数中>(或data.table)包:

库(reshape2)res <-melt(xtabs(weights ~ group + code, out))

给出:

<块引用>

>类(资源)[1]数据框">资源组码值1 1 1 0.3992 2 1 0.2123 3 1 0.4744 1 2 1.9975 2 2 0.3736 3 2 0.5697 1 3 0.0008 2 3 1.3239 3 3 0.316

<小时>

您也可以使用 dplyrtidyr 的组合来做到这一点:

库(dplyr)图书馆(整理)%>%完成(代码,组,填充=列表(权重=0))%>%group_by(代码,组)%>%总结(总和(权重))

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:    1   0.399      1
        3:    2   1.610      1
        4:    3   1.323      2
        5:    2   0.373      2                                            
        6:    1   0.212      2
        7:    3   0.316      3
        8:    2   0.569      3
        9:    1   0.120      3
       10:    1   0.354      3

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

Any idea how I could achieve this?

解决方案

Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code, group, unique = TRUE)
    ][, lapply(.SD, sum), by = .(code, group)
      ][is.na(weights), weights := 0]

gives:

   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316


Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

which gives:

     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))

which gives:

> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316


You could also do this with a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))

这篇关于data.table:表中所有现有组合的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆