合计所有符合所有可能条件标准的值 [英] Sum all values meeting a criteria for all possible criteria

查看:303
本文介绍了合计所有符合所有可能条件标准的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.table,如下面的:

I have a data.table such as the following:

a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))

> a
     color count include
[1,]   Red     1       1
[2,]  Blue     2       1
[3,]   Red     6       1
[4,] Green     4       1
[5,]   Red     2       0
[6,]  Blue     1       0
[7,]  Blue     1       1

我想创建一个新的data.table只有唯一的颜色值,并且每个匹配include = 1的计数列的总和,如

I wish to create a new data.table which has only the unique colour values, and a sum of the count column for each of these that match include=1, like the below:

     colour total
[1,]   Red     7
[2,]  Blue     2
[3,] Green     4  

我尝试过以下操作,与过去:

I have tried the following, which I've had some success with in the past:

> a[,include == 1,list(total=sum(count)),by=colour]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count)),  : 
  Provide either 'by' or 'keyby' but not both

a 没有键,并且键有 color 时,也会收到同样的错误消息。尝试,键设置为 color ,如下:

This same error message is received when a has no key, and when it has a key of colour. I have also tried, with the key set to colour, the following:

> a[,include == 1,list(quantity=sum(count))]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count))) : 
  Each item in the 'by' or 'keyby' list must be same length as rows in x (7): 1

我找不到任何其他好的解决方案。任何帮助非常感谢。

I can't find any other good solutions. Any help much appreciated.

推荐答案

library(data.table)
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
a[include == 1, list(total=sum(count)), keyby = color]

   color total
1:  Blue     3
2: Green     4
3:   Red     7



< hr>

从马修编辑:


Edit from Matthew :

或如果包含 only)values 0 1 then:

Or if include takes (only) values 0 and 1 then :

a[, list(total=sum(count*include)), keyby = color]


b $ b

或如果 include 包含其他值,则:

a[, list(total=sum(count*(include==1))), keyby = color]


$ b b

其中 NA 可能需要考虑。

扫描 i ,但这取决于数据大小和属性。这些只需要最大组的工作内存,而 include <1 i 中需要至少一个向量分配只要 nrow(a)

Those might be more efficient by avoiding the vector scanning i, but it depends a lot on data size and properties. These only need working memory as large as the largest group, whereas include==1 in i needs at least one vector allocated as long as nrow(a).

这篇关于合计所有符合所有可能条件标准的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆