从data.table聚合返回多个列 [英] return multiple columns from data.table aggregation
问题描述
我想使用 data.table
替代 aggregate()
或 ddply()
,因为这两种方法不能像希望的那样有效地缩放到大对象。不幸的是,我还没有搞清楚如何获得向量返回聚合函数在 data.table
的结果中生成多个列。例如:
所需的包
库(plyr)
库(data.table)
#模拟数据
x < - data.table(value = rnorm(100),g = rep(letters [1:5],each = 20))
#ddply输出我想从data.table获得
ddply(data.frame(x),'g',function(i)quantile(i $ value))
g 0%25%50%75%100%
1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530
2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469
3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053
4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601
5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800
#不是我要找的:
x [ ,quantile(value),by = g]
g V1
1:a -1.547495345
2:a -0.784279536
3:a 0.202456288
4 :a 0.609876241
5:a 2.223529739
6:b -1.366937074
7:b -0.441838791
8:b -0.085876995
9:b 0.782686277
10:b 2.236468703
基本上, ddply
和 aggregate
都是宽格式,而 data.table
的输出是长格式。
尝试强制转到列表:
> x [,as.list(quantile(value)),by = g]
g 0%25%50%75%100%
1:a -1.7507334 -0.632331909 0.07435249 0.7459778 1.428552
2 :b -2.2043481 -0.005652353 0.10534325 0.5769475 1.241754
3:c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.360017
4:d -0.7434664 -0.055232431 0.22062823 1.1864389 3.021124
5:e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152
I would like to use data.table
as an alternative to aggregate()
or ddply()
, as these two methods aren't scaling to large objects as efficiently as hoped. Unfortunately, I haven't figured out how to get vector-returning aggregate functions to generate multiple columns in the result from data.table
. For example:
# required packages
library(plyr)
library(data.table)
# simulated data
x <- data.table(value=rnorm(100), g=rep(letters[1:5], each=20))
# ddply output that I would like to get from data.table
ddply(data.frame(x), 'g', function(i) quantile(i$value))
g 0% 25% 50% 75% 100%
1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530
2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469
3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053
4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601
5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800
# not quite what I am looking for:
x[, quantile(value), by=g]
g V1
1: a -1.547495345
2: a -0.784279536
3: a 0.202456288
4: a 0.609876241
5: a 2.223529739
6: b -1.366937074
7: b -0.441838791
8: b -0.085876995
9: b 0.782686277
10: b 2.236468703
Essentially, the output from ddply
and aggregate
are in wide-format, while the output from the data.table
is in long format. Is the answer reshaping the data, or some additional arguments to my data.table
object?
Try coercing to a list:
> x[, as.list(quantile(value)), by=g]
g 0% 25% 50% 75% 100%
1: a -1.7507334 -0.632331909 0.07435249 0.7459778 1.428552
2: b -2.2043481 -0.005652353 0.10534325 0.5769475 1.241754
3: c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.360017
4: d -0.7434664 -0.055232431 0.22062823 1.1864389 3.021124
5: e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152
这篇关于从data.table聚合返回多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!