R中的Groupby箱和集合 [英] Groupby bins and aggregate in R

查看:202
本文介绍了R中的Groupby箱和集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有(a,b,c)之类的数据

I have data like (a,b,c)

a b c
1 2 1
2 3 1
9 2 2
1 6 2

其中'a'范围被划分为n个(例如3个)相等的部分,并且聚合函数计算b个值(例如max),并在'c'处进行分组。

where 'a' range is divided into n (say 3) equal parts and aggregate function calculates b values (say max) and grouped by at 'c' also.

所以输出看起来像

a_bin  b_m(c=1) b_m(c=2)
1-3     3          6
4-6     NaN        NaN
7-9     NaN        2

其中M = N,其中M =数字箱,N =唯一的c个样本或所有范围

Which is MxN where M=number of a bins, N=unique c samples or all range

我该如何处理?

推荐答案

我会结合使用 data.table

I would use a combination of data.table and reshape2 which are both fully optimized for speed (not using for loops from apply family).

输出不会返回未使用的垃圾箱。

The output won't return the unused bins.

v <- c(1, 4, 7, 10) # creating bins 
temp$int <- findInterval(temp$a, v)

library(data.table)
temp <- setDT(temp)[, list(b_m = max(b)), by = c("c", "int")]

library(reshape2)
temp <- dcast.data.table(temp, int ~ c, value.var = "b_m")
## colnames(temp) <- c("a_bin", "b_m(c=1)", "b_m(c=2)") # Optional for prettier table
## temp$a_bin<- c("1-3", "7-9") # Optional for prettier table

##   a_bin b_m(c=1) b_m(c=2)
## 1   1-3        3        6
## 2   7-9       NA        2

这篇关于R中的Groupby箱和集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆