R中的Groupby箱和集合 [英] Groupby bins and aggregate in R
本文介绍了R中的Groupby箱和集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有(a,b,c)之类的数据
I have data like (a,b,c)
a b c
1 2 1
2 3 1
9 2 2
1 6 2
其中'a'范围被划分为n个(例如3个)相等的部分,并且聚合函数计算b个值(例如max),并在'c'处进行分组。
where 'a' range is divided into n (say 3) equal parts and aggregate function calculates b values (say max) and grouped by at 'c' also.
所以输出看起来像
a_bin b_m(c=1) b_m(c=2)
1-3 3 6
4-6 NaN NaN
7-9 NaN 2
其中M = N,其中M =数字箱,N =唯一的c个样本或所有范围
Which is MxN where M=number of a bins, N=unique c samples or all range
我该如何处理?
推荐答案
我会结合使用 data.table
I would use a combination of data.table
and reshape2
which are both fully optimized for speed (not using for
loops from apply
family).
输出不会返回未使用的垃圾箱。
The output won't return the unused bins.
v <- c(1, 4, 7, 10) # creating bins
temp$int <- findInterval(temp$a, v)
library(data.table)
temp <- setDT(temp)[, list(b_m = max(b)), by = c("c", "int")]
library(reshape2)
temp <- dcast.data.table(temp, int ~ c, value.var = "b_m")
## colnames(temp) <- c("a_bin", "b_m(c=1)", "b_m(c=2)") # Optional for prettier table
## temp$a_bin<- c("1-3", "7-9") # Optional for prettier table
## a_bin b_m(c=1) b_m(c=2)
## 1 1-3 3 6
## 2 7-9 NA 2
这篇关于R中的Groupby箱和集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文