按组获取最高值 [英] Getting the top values by group
问题描述
这是一个示例数据框:
d <- data.frame(
x = runif(90),
grp = gl(3, 30)
)
我想要 d
的子集,其中包含 x
的前 5 个值的每个 grp
值的行.
I want the subset of d
containing the rows with the top 5 values of x
for each value of grp
.
使用 base-R,我的方法类似于:
Using base-R, my approach would be something like:
ordered <- d[order(d$x, decreasing = TRUE), ]
splits <- split(ordered, ordered$grp)
heads <- lapply(splits, head)
do.call(rbind, heads)
## x grp
## 1.19 0.8879631 1
## 1.4 0.8844818 1
## 1.12 0.8596197 1
## 1.26 0.8481809 1
## 1.18 0.8461516 1
## 1.29 0.8317092 1
## 2.31 0.9751049 2
## 2.34 0.9269764 2
## 2.57 0.8964114 2
## 2.58 0.8896466 2
## 2.45 0.8888834 2
## 2.35 0.8706823 2
## 3.74 0.9884852 3
## 3.73 0.9837653 3
## 3.83 0.9375398 3
## 3.64 0.9229036 3
## 3.69 0.8021373 3
## 3.86 0.7418946 3
使用 dplyr
,我希望它可以工作:
Using dplyr
, I expected this to work:
d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
head(n = 5)
但它只返回总前 5 行.
but it only returns the overall top 5 rows.
将 head
交换为 top_n
返回整个 d
.
d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
top_n(n = 5)
如何获得正确的子集?
推荐答案
来自 dplyr1.0.0, "slice_min()
和 slice_max()
选择一个变量的最小值或最大值的行,从混乱的<代码>top_n()."
From dplyr 1.0.0, "slice_min()
and slice_max()
select the rows with the minimum or maximum values of a variable, taking over from the confusing top_n().
"
d %>% group_by(grp) %>% slice_max(order_by = x, n = 5)
# # A tibble: 15 x 2
# # Groups: grp [3]
# x grp
# <dbl> <fct>
# 1 0.994 1
# 2 0.957 1
# 3 0.955 1
# 4 0.940 1
# 5 0.900 1
# 6 0.963 2
# 7 0.902 2
# 8 0.895 2
# 9 0.858 2
# 10 0.799 2
# 11 0.985 3
# 12 0.893 3
# 13 0.886 3
# 14 0.815 3
# 15 0.812 3
Pre-dplyr 1.0.0
使用 top_n
:
来自 ?top_n
,关于 wt
参数:
From ?top_n
, about the wt
argument:
用于排序的变量 [...] 默认为 tbl 中的最后一个变量.
The variable to use for ordering [...] defaults to the last variable in the tbl".
数据集中的最后一个变量是grp",这不是您希望排序的变量,这也是您的 top_n
尝试返回整个 d"的原因.因此,如果您希望按x"排序在您的数据集中,您需要指定 wt = x
.
The last variable in your data set is "grp", which is not the variable you wish to rank, and which is why your top_n
attempt "returns the whole of d". Thus, if you wish to rank by "x" in your data set, you need to specify wt = x
.
d %>%
group_by(grp) %>%
top_n(n = 5, wt = x)
数据:
set.seed(123)
d <- data.frame(
x = runif(90),
grp = gl(3, 30))
这篇关于按组获取最高值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!