如何使用Dplyr的Summarize以及which()查找最小值/最大值 [英] How to use Dplyr's Summarize and which() to lookup min/max values

查看:199
本文介绍了如何使用Dplyr的Summarize以及which()查找最小值/最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,12,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") 
data <- data.frame(Name, Age, Group)

我想使用dplyr

(1)通过组对数据分组
(2)显示每个组中的最小年龄和最大年龄
(3)显示以下人员的姓名:最低年龄和最高年龄

(1) group the data by "Group" (2) show the min and max Age within each Group (3) show the Name of the person with the min and max ages

以下代码可以做到这一点:

The following code does this:

data %>% group_by(Group) %>%
     summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], 
               maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])

哪个工作好:

  Group minAge minAgeName maxAge maxAgeName
1     A     22        Sam     22        Sam
2     B     12      Sarah     58      James
3     C     17     Andrew     82      Sally
4     D     12     Mairin     67        Ray

但是,如果存在多个最小值或最大值,我就会遇到问题:

However, I have a problem if there are multiple min or max values:

Name <- c("Sam", "Sarah", "Jim", "Fred", "James", "Sally", "Andrew", "John", "Mairin", "Kate", "Sasha", "Ray", "Ed")
Age <- c(22,31,31,35,58,82,17,34,12,24,44,67,43)
Group <- c("A", "B", "B", "B", "B", "C", "C", "D", "D", "D", "D", "D", "D") 
data <- data.frame(Name, Age, Group)

> data %>% group_by(Group) %>%
+   summarize(minAge = min(Age), minAgeName = Name[which(Age == min(Age))], 
+             maxAge = max(Age), maxAgeName = Name[which(Age == max(Age))])
Error: expecting a single value

我正在寻找两个解决方案:

I'm looking for two solutions:

(1)无论显示哪个最小或最大名称,仅显示一个(即找到的第一个值)
(2)其中,如果存在平局,则显示所有最小值和最大值

(1) where it doesn't matter which min or max name is shown, just that one is shown (i.e., the first value found) (2) where if there are "ties" all minimum values and maximum values are shown

如果不清楚,请让我知道,谢谢!

Please let me know if this isn't clear and thanks in advance!

推荐答案

您可以使用 which.min which.max 获取第一个值。

You can use which.min and which.max to get the first value.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = Name[which.min(Age)], 
            maxAge = max(Age), maxAgeName = Name[which.max(Age)])

要获取所有值,请使用例如粘贴适当的折叠参数。

To get all values, use e.g. paste with an appropriate collapse argument.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "), 
            maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))

这篇关于如何使用Dplyr的Summarize以及which()查找最小值/最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆