使用data.table计数和汇总/汇总列 [英] Use data.table to count and aggregate / summarize a column
问题描述
我想对 data.table
中的一列进行计数和汇总,所以找不到最有效的方法。这似乎与我想要的 R用data.table汇总多列。
I want to count and aggregate(sum) a column in a data.table
, and couldn't find the most efficient way to do this. This seems to be close to what I want R summarizing multiple columns with data.table.
我的数据
set.seed(321)
dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)),
VAR = sample(c(0,1), 16, replace=T))
> dat
MNTH VAR
1: 201501 1
2: 201501 1
3: 201501 0
4: 201501 0
5: 201502 0
6: 201502 0
7: 201502 0
8: 201503 0
9: 201503 0
10: 201503 1
11: 201503 1
12: 201503 0
13: 201504 1
14: 201504 0
15: 201504 1
16: 201504 0
我想通过 MNTH
对 VAR
进行计数和求和使用data.table。预期结果:
I want to both count and sum VAR
by MNTH
using data.table. The desired result:
MNTH COUNT VAR
1 201501 4 2
2 201502 3 0
3 201503 5 2
4 201504 4 2
推荐答案
<您所指的帖子提供了一种关于如何将一种聚合方法应用于多个列的方法。如果要将不同的汇总方法应用于不同的列,则可以执行以下操作:
The post you are referring to gives a method on how to apply one aggregation method to several columns. If you want to apply different aggregation methods to different columns, you can do:
dat[, .(count = .N, var = sum(VAR)), by = MNTH]
这将导致:
MNTH count var
1: 201501 4 2
2: 201502 3 0
3: 201503 5 2
4: 201504 4 2
您还可以通过以下方式通过更新数据集将这些值添加到现有数据集中:
You can also add these values to your existing dataset by updating your dataset by reference:
dat[, `:=` (count = .N, var = sum(VAR)), by = MNTH]
:
> dat
MNTH VAR count var
1: 201501 1 4 2
2: 201501 1 4 2
3: 201501 0 4 2
4: 201501 0 4 2
5: 201502 0 3 0
6: 201502 0 3 0
7: 201502 0 3 0
8: 201503 0 5 2
9: 201503 0 5 2
10: 201503 1 5 2
11: 201503 1 5 2
12: 201503 0 5 2
13: 201504 1 4 2
14: 201504 0 4 2
15: 201504 1 4 2
16: 201504 0 4 2
有关如何使用 data.table 语法,请参见
For further reading about how to use data.table syntax, see the Getting started guides on the GitHub wiki.
这篇关于使用data.table计数和汇总/汇总列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!