使用 data.table 的简单频率表 [英] Simple frequency tables using data.table
问题描述
我正在寻找一种通过 data.table 进行简单聚合/计数的方法.
I'm looking for a way to do simple aggregates / counts via data.table.
考虑 iris 数据,每个物种有 50 个观测值.要计算每个物种的观察结果,我必须对物种以外的列进行汇总,例如Sepal.Length".
Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".
library(data.table)
dt = as.data.table(iris)
dt[,length(Sepal.Length), Species]
我觉得这很令人困惑,因为乍一看我好像在对 Sepal.Length 做一些事情,而实际上只有物种才是重要的.
I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.
这是我想说的,但我没有得到有效的输出:
This is what I would prefer to say, but I don't get valid output:
dt[,length(Species), Species]
正确的输入和输出,但代码笨拙:
> dt[,length(Sepal.Length), Species]
Species V1
1: setosa 50
2: versicolor 50
3: virginica 50
输入输出不正确,但代码更好:
> dt[,length(Species), Species]
Species V1
1: setosa 1
2: versicolor 1
3: virginica 1
有没有优雅的方法解决这个问题?
推荐答案
data.table
有几个可以在 j
表达式中使用的符号.值得注意的是
data.table
has a couple of symbols that can be used within the j
expression. Notably
.N
将为您提供每组中的行数.
.N
will give you the number of number of rows in each group.
查看 ?data.table
下的 by
高级:按by
或i进行分组时,j表达式中可以使用符号.SD、.BY和.N,定义如下.
Advanced: When grouping by
by
or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.
....
.N 是一个整数,长度为 1,包含组中的行数.
.N is an integer, length 1, containing the number of rows in the group.
例如:
dt[, .N ,by = Species]
Species N
1: setosa 50
2: versicolor 50
3: virginica 50
这篇关于使用 data.table 的简单频率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!