汇总数据框中的分组记录 [英] Summarising grouped records in a dataframe

查看:84
本文介绍了汇总数据框中的分组记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据帧,如下所示:

I have a data frame in R that looks like this:

> TimeOffset, Source, Length 
> 0         1           1500
> 0.1       1           1000    
> 0.2       1           50
> 0.4       2           25
> 0.6       2           3
> 1.1       1           1500
> 1.4       1           18
> 1.6       2           2500
> 1.9       2           18
> 2.1       1           37
> ...

我想将其转换为

> TimeOffset, Source, Length
> 0.2         1       2550
> 0.6         2       28
> 1.4         1       1518
> 1.9         2       2518
> ...

尝试将其用英语表达,我想将具有相同源"的连续记录归为一组,然后每组打印出一条记录,以显示该组中时间偏移最大的源,源以及长度的总和在那个小组中.

Trying to put this into English, I want to group consecutive records with the same 'Source' together, then printing out a single record per group showing the highest time offset in that group, the source, and the sum of the lengths in that group.

TimeOffset值将始终增加.

The TimeOffset values will always increase.

我怀疑这在R中是可能的,但是我真的不知道从哪里开始.在紧急情况下,我可以将数据帧导出并进行例如Python,但如果可能的话,我宁愿留在R中.

I suspect this is possible in R, but I really don't know where to start. In a pinch I could export the data frame out and do it in e.g. Python, but I'd prefer to stay within R if possible.

在此先感谢您提供的任何帮助

Thanks in advance for any assistance you can provide

推荐答案

首先,您需要创建一个id变量来指定您的组,而不依赖于它们是连续的.在那之后,这很简单.

First you need to create an id variable that specifies your groups without relying on the fact that they are consecutive. After that it is pretty straight forward.

> dat <- data.frame(    TimeOffset = c(0,.1,.2,.4,.6,1.1,1.4,1.6,1.9,2.1),
+ Source=c(1,1,1,2,2,1,1,2,2,1),
+ Length=c(1500,1000,50,25,3,1500,18,2500,18,37))
> dat
   TimeOffset Source Length
1         0.0      1   1500
2         0.1      1   1000
3         0.2      1     50
4         0.4      2     25
5         0.6      2      3
6         1.1      1   1500
7         1.4      1     18
8         1.6      2   2500
9         1.9      2     18
10        2.1      1     37
> 
> id <- cumsum(c(TRUE,diff(dat$Source)!=0))
> id
 [1] 1 1 1 2 2 3 3 4 4 5
> 
> cbind(TimeOffset=tapply(dat$TimeOffset,id,max),
+ Source=tapply(dat$Source,id,max),
+ Length=tapply(dat$Length,id,sum))
  TimeOffset Source Length
1        0.2      1   2550
2        0.6      2     28
3        1.4      1   1518
4        1.9      2   2518
5        2.1      1     37

这篇关于汇总数据框中的分组记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆