如何使用Dplyr的Summarize以及which()查找最小值/最大值 [英] How to use Dplyr's Summarize and which() to lookup min/max values
问题描述
我有以下数据:
Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,12,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D")
data <- data.frame(Name, Age, Group)
我想使用dplyr
(1)通过组对数据分组
(2)显示每个组中的最小年龄和最大年龄
(3)显示以下人员的姓名:最低年龄和最高年龄
(1) group the data by "Group" (2) show the min and max Age within each Group (3) show the Name of the person with the min and max ages
以下代码可以做到这一点:
The following code does this:
data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))],
maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])
哪个工作好:
Group minAge minAgeName maxAge maxAgeName
1 A 22 Sam 22 Sam
2 B 12 Sarah 58 James
3 C 17 Andrew 82 Sally
4 D 12 Mairin 67 Ray
但是,如果存在多个最小值或最大值,我就会遇到问题:
However, I have a problem if there are multiple min or max values:
Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,31,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D")
data <- data.frame(Name, Age, Group)
> data %>% group_by(Group) %>%
+ summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))],
+ maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])
Error: expecting a single value
我正在寻找两个解决方案:
I'm looking for two solutions:
(1)无论显示哪个最小或最大名称,仅显示一个(即找到的第一个值)
(2)其中,如果存在平局,则显示所有最小值和最大值
(1) where it doesn't matter which min or max name is shown, just that one is shown (i.e., the first value found) (2) where if there are "ties" all minimum values and maximum values are shown
如果不清楚,请让我知道,谢谢!
Please let me know if this isn't clear and thanks in advance!
推荐答案
您可以使用 which.min
和 which.max
获取第一个值。
You can use which.min
and which.max
to get the first value.
data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = Name[which.min(Age)],
maxAge = max(Age), maxAgeName = Name[which.max(Age)])
要获取所有值,请使用例如粘贴适当的折叠
参数。
To get all values, use e.g. paste with an appropriate collapse
argument.
data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "),
maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))
这篇关于如何使用Dplyr的Summarize以及which()查找最小值/最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!