合计所有符合所有可能条件标准的值 [英] Sum all values meeting a criteria for all possible criteria
问题描述
我有一个data.table,如下面的:
I have a data.table such as the following:
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
> a
color count include
[1,] Red 1 1
[2,] Blue 2 1
[3,] Red 6 1
[4,] Green 4 1
[5,] Red 2 0
[6,] Blue 1 0
[7,] Blue 1 1
我想创建一个新的data.table只有唯一的颜色值,并且每个匹配include = 1的计数列的总和,如
I wish to create a new data.table which has only the unique colour values, and a sum of the count column for each of these that match include=1, like the below:
colour total
[1,] Red 7
[2,] Blue 2
[3,] Green 4
我尝试过以下操作,与过去:
I have tried the following, which I've had some success with in the past:
> a[,include == 1,list(total=sum(count)),by=colour]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count)), :
Provide either 'by' or 'keyby' but not both
当 a
没有键,并且键有 color
时,也会收到同样的错误消息。尝试,键设置为 color
,如下:
This same error message is received when a
has no key, and when it has a key of colour
. I have also tried, with the key set to colour
, the following:
> a[,include == 1,list(quantity=sum(count))]
Error in `[.data.table`(a, , include == 1, list(quantity = sum(count))) :
Each item in the 'by' or 'keyby' list must be same length as rows in x (7): 1
我找不到任何其他好的解决方案。任何帮助非常感谢。
I can't find any other good solutions. Any help much appreciated.
推荐答案
library(data.table)
a <- data.table(color=c("Red","Blue","Red","Green","Red","Blue","Blue"), count=c(1,2,6,4,2,1,1),include=c(1,1,1,1,0,0,1))
a[include == 1, list(total=sum(count)), keyby = color]
color total
1: Blue 3
2: Green 4
3: Red 7
< hr>
从马修编辑:
Edit from Matthew :
或如果包含
only)values 0
和 1
then:
Or if include
takes (only) values 0
and 1
then :
a[, list(total=sum(count*include)), keyby = color]
b $ b
或如果 include
包含其他值,则:
a[, list(total=sum(count*(include==1))), keyby = color]
$ b b
其中 NA
可能需要考虑。
扫描 i
,但这取决于数据大小和属性。这些只需要最大组的工作内存,而 include <1
在 i
中需要至少一个向量分配只要 nrow(a)
。
Those might be more efficient by avoiding the vector scanning i
, but it depends a lot on data size and properties. These only need working memory as large as the largest group, whereas include==1
in i
needs at least one vector allocated as long as nrow(a)
.
这篇关于合计所有符合所有可能条件标准的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!