使用data.table的简单频率表 [英] Simple frequency tables using data.table
问题描述
我正在寻找一种通过data.table执行简单聚合/计数的方法。
考虑虹膜数据,每个物种有50个观察值。为了计算每个物种的观测数据,我必须总结一个列而不是物种,例如Sepal.Length。
(data.table)
dt = as.data.table(iris)
dt [,length(Sepal.Length),Species]
我觉得这很困惑,因为它看起来像我在Sepal.Length上做一些事情,乍一看,真的只有物种是重要的。
这是我想说的,但我没有得到有效的输出:
dt [,length(Species),Species]
正确的输入和输出, h2>
> dt [,length(Sepal.Length),Species]
种类V1
1:setosa 50
2:versicolor 50
3:virginica 50
> dt [,length(Sepal.Length),Species]
种类V1
1:setosa 50
2:versicolor 50
3:virginica 50
输入和输出不正确,但更漂亮:
> dt [,length(Species),Species]
种类V1
1:setosa 1
2:versicolor 1
3:virginica 1
解决方案p>
data.table
有一些符号,可以在j
表达式中使用。值得注意的是
-
.N
会给出每组中的行数。<$ c
请参阅?data.table
c>
高级:按
分组
或i,符号.SD,.BY和.N可以用在j表达式中,定义如下。
....
.N是一个整数,长度为1,包含组中的行数。
例如:
dt [,.N,by = Species]
种类N
1:setosa 50
2:versicolor 50
3:virginica 50
I'm looking for a way to do simple aggregates / counts via data.table.
Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".
library(data.table)
dt = as.data.table(iris)
dt[,length(Sepal.Length), Species]
I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.
This is what I would prefer to say, but I don't get valid output:
dt[,length(Species), Species]
Correct input and output, but clunky code:
> dt[,length(Sepal.Length), Species]
Species V1
1: setosa 50
2: versicolor 50
3: virginica 50
Incorrect input and output, but nicer code:
> dt[,length(Species), Species]
Species V1
1: setosa 1
2: versicolor 1
3: virginica 1
Is there an elegant way around this?
data.table
has a couple of symbols that can be used within the j
expression. Notably
.N
will give you the number of number of rows in each group.
see ?data.table
under the details for by
Advanced: When grouping by
by
or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.....
.N is an integer, length 1, containing the number of rows in the group.
For example:
dt[, .N ,by = Species]
Species N
1: setosa 50
2: versicolor 50
3: virginica 50
这篇关于使用data.table的简单频率表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!