如何仅按组将具有最低和最高值的行保留在特定列中? [英] How do I only keep the rows with the lowest and highest value in a certain column, by groups?
问题描述
简而言之,我该怎么做
structure(list(id = c(1, 2, 3, 4, 5, 6), user = c(1, 1, 1, 2,
2, 2), value = c(1, 3, 5, 2, 5, 9)), .Names = c("id", "user",
"value"), row.names = c(NA, -6L), class = "data.frame")
对此?
structure(list(id = c(1, 3, 4, 6), user = c(1, 1, 2, 2), value = c(1,
5, 2, 9)), .Names = c("id", "user", "value"), row.names = c(NA,
-4L), class = "data.frame")
含义是,对于每个用户,只需保留与最低和最高值
相对应的两行。
Meaning, for each user, need to keep only the two rows corresponding to the lowest and highest value
.
如果可能的话,我想使用 dplyr
解决方案。否则,任何解决方案都可以。
I'd like a solution using dplyr
, if possible. Otherwise, any solution is fine.
推荐答案
我们可以将 slice
与 which.min / which.max
按用户分组后
We can use slice
with which.min/which.max
after grouping by 'user'
library(dplyr)
df1 %>%
group_by(user) %>%
slice(c(which.min(value), which.max(value)))
# id user value
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 3 1 5
#3 4 2 2
#4 6 2 9
或者另一个选择是 arrange
和切片
。按用户分组后,排列
将值按升序排列,每个用户和切片
的第一个最后一行
Or another option is arrange
with slice
. After grouping by 'user', arrange
the 'value' in ascending for each 'user' and slice
the first and last row
df1 %>%
group_by(user) %>%
arrange(value) %>%
slice(c(1, n()))
如果存在最小值
和/或最大值
'值'并且想要保留所有 min
和 max
行,请使用 filter
If there are ties for min
and/or max
'value' and wanted to keep all the min
and max
rows, use filter
df1 %>%
group_by(user) %>%
filter(value %in% c(min(value), max(value)))
这篇关于如何仅按组将具有最低和最高值的行保留在特定列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!