dplyr 过滤器:获取变量最少的行,但如果有多个最小值,则只获取第一个 [英] dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

查看:12
本文介绍了dplyr 过滤器:获取变量最少的行,但如果有多个最小值,则只获取第一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 dplyr 制作一个分组过滤器,这样在每个组中只返回具有变量 x 最小值的那一行.

我的问题是:正如预期的那样,在多个最小值的情况下,返回具有最小值的所有行.但就我而言,如果存在多个最小值,我只想要第一行.

这是一个例子:

df <- data.frame(A=c("A", "A", "A", "B", "B", "B", "C", "C", "C"),x=c(1, 1, 2, 2, 3, 4, 5, 5, 5),y=范数(9))图书馆(dplyr)df.g <- group_by(df, A)过滤器(df.g,x == min(x))

正如预期的那样,返回所有最小值:

来源:本地数据框 [6 x 3]组别:AXY1 A 1 -1.045843352 A 1 0.979493993 乙 2 0.796009714 C 5 -0.086551515 C 5 0.166499626 C 5 -0.05948012

使用 ddply,我会这样处理任务:

库(plyr)ddply(df, .(A), 函数(z) {z[z$x == min(z$x), ][1, ]})

... 有效:

 A x y1 A 1 -1.045843352 乙 2 0.796009713 C 5 -0.08655151

问:有没有办法在 dplyr 中解决这个问题?(出于速度原因)

解决方案

为了完整性:这是最终的 dplyr 解决方案,源自@hadley 和 @Arun 的评论:

库(dplyr)df.g <- group_by(df, A)过滤器(df.g, rank(x, ties.method="first")==1)

I want to make a grouped filter using dplyr, in a way that within each group only that row is returned which has the minimum value of variable x.

My problem is: As expected, in the case of multiple minima all rows with the minimum value are returned. But in my case, I only want the first row if multiple minima are present.

Here's an example:

df <- data.frame(
A=c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
x=c(1, 1, 2, 2, 3, 4, 5, 5, 5),
y=rnorm(9)
)

library(dplyr)
df.g <- group_by(df, A)
filter(df.g, x == min(x))

As expected, all minima are returned:

Source: local data frame [6 x 3]
Groups: A

  A x           y
1 A 1 -1.04584335
2 A 1  0.97949399
3 B 2  0.79600971
4 C 5 -0.08655151
5 C 5  0.16649962
6 C 5 -0.05948012

With ddply, I would have approach the task that way:

library(plyr)
ddply(df, .(A), function(z) {
    z[z$x == min(z$x), ][1, ]
})

... which works:

  A x           y
1 A 1 -1.04584335
2 B 2  0.79600971
3 C 5 -0.08655151

Q: Is there a way to approach this in dplyr? (For speed reasons)

解决方案

Just for completeness: Here's the final dplyr solution, derived from the comments of @hadley and @Arun:

library(dplyr)
df.g <- group_by(df, A)
filter(df.g, rank(x, ties.method="first")==1)

这篇关于dplyr 过滤器:获取变量最少的行,但如果有多个最小值,则只获取第一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆