通过选择每组一行来折叠数据帧 [英] Collapsing data frame by selecting one row per group

查看:166
本文介绍了通过选择每组一行来折叠数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过从特定列中具有相同值的每组行中除去一行以外的所有行排除数据框。换句话说,每组中的第一行。



例如,我想转换这个

 > d = data.frame(x = c(1,1,2,4),y = c(10,11,12,13),z = c(20,19,18,17))
> ; d
xyz
1 1 10 20
2 1 11 19
3 2 12 18
4 4 13 17
pre>

进入:

  xyz 
1 1 11 19
2 2 12 18
3 4 13 17

我是使用聚合来执行此操作,但性能是不可接受的更多的数据:

 > d.ordered = d [order(-d $ y),] 
>聚合(d.ordered,by = list(key = d.ordered $ x),FUN = function(x){x [1]})

我已经尝试使用与这里相同的函数参数的split / unsplit,但是非分行投诉了重复的行号。



是有可能吗有没有一个R成语将rle的长度向量转换为开始每个运行的行的索引,然后我可以用它来从数据框中提取这些行。

解决方案

也许 duplicateated()可以帮助:

 code> R> d [!duplicateated(d $ x),] 
x y z
1 1 10 20
3 2 12 18
4 4 13 17
R>

编辑 Shucks,没关系。这是在每个重复的块中首先选出的,你想要最后一个。所以这里是使用 plyr 的另一个尝试:

  R> ddply(d,x,function(z)tail(z,1))
xyz
1 1 11 19
2 2 12 18
3 4 13 17
R>

这里 plyr 努力找到唯一的子集,循环并应用所提供的函数 - 这简单地返回块中的最后一组观察值 z 使用 tail(z,1)


I'm trying to collapse a data frame by removing all but one row from each group of rows with identical values in a particular column. In other words, the first row from each group.

For example, I'd like to convert this

> d = data.frame(x=c(1,1,2,4),y=c(10,11,12,13),z=c(20,19,18,17))
> d
  x  y  z
1 1 10 20
2 1 11 19
3 2 12 18
4 4 13 17

Into this:

    x  y  z
1   1 11 19
2   2 12 18
3   4 13 17

I'm using aggregate to do this currently, but the performance is unacceptable with more data:

> d.ordered = d[order(-d$y),]
> aggregate(d.ordered,by=list(key=d.ordered$x),FUN=function(x){x[1]})

I've tried split/unsplit with the same function argument as here, but unsplit complains about duplicate row numbers.

Is rle a possibility? Is there an R idiom to convert rle's length vector into the indices of the rows that start each run, which I can then use to pluck those rows out of the data frame?

解决方案

Maybe duplicated() can help:

R> d[ !duplicated(d$x), ]
  x  y  z
1 1 10 20
3 2 12 18
4 4 13 17
R> 

Edit Shucks, never mind. This picks the first in each block of repetitions, you wanted the last. So here is another attempt using plyr:

R> ddply(d, "x", function(z) tail(z,1))
  x  y  z
1 1 11 19
2 2 12 18
3 4 13 17
R> 

Here plyr does the hard work of finding unique subsets, looping over them and applying the supplied function -- which simply returns the last set of observations in a block z using tail(z, 1).

这篇关于通过选择每组一行来折叠数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆