“D”中的“选择A，B，max（C）”的dplyr成语“C” [英] dplyr idiom for "select A, B, max(C) from D group by C"

查看：150 发布时间：2017/7/13 22:12:31 r dplyr

本文介绍了“D”中的“选择A，B，max（C）”的dplyr成语“C”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在通过具有多个结果列的查询寻找SQL组的dplyr成语。例如：

I am looking for a dplyr idiom for SQL group by queries with several result columns. For example:

library(dplyr)
library(sqldf)

df <- data.frame(
  fuel=rep(c("Coal", "Gas"), each=3), 
  year=rep(c(1998,1999,2000), 2),
  percent=c(20,30,40,80,70,60)) 

sqldf("select fuel, year, max(percent) from df group by fuel")

 fuel year max(percent)
 1 Coal 2000           40
 2  Gas 1998           80

sqldf提供给定燃料达到最大百分比（忽视关系）的年份。使用 dplyr 的最佳方法是什么？简单地说：

The sqldf supplies the year that a given fuel reached it's maximum percentage (ignoring ties). What is the best way to do this using dplyr? Simply doing:

group_by(df,fuel) %>% summarise(max(percent))

给出：

  fuel max(percent)
1 Coal           40
2  Gas           80

似乎不是添加额外结果列的地方。我可以使用 mutate 间接地执行此操作：

and there does not seem to be a place to add an extra result column. I can do it indirectly by using mutate:

group_by(df,fuel) %>% mutate(maxp=max(percent)) %>% 
   filter(percent==maxp) %>% select(-percent)

这是最好的/唯一的方式吗？

Is that the best/only way?

推荐答案

一些更多选项

使用 distinct （这与 slice（which.max （％）），但是由组操作避免，因此可能更有效）

Using distinct (this is similar to slice(which.max(percent)), but will avoid by group operations and hence probably more efficient)

df %>% 
  arrange(desc(percent)) %>%
  distinct(fuel)

#   fuel year percent
# 1  Gas 1998      80
# 2 Coal 2000      40

或使用过滤器（这将选择全部具有最大值的行）

Or using filter (this will select all the rows having a maxima)

df %>% 
  group_by(fuel) %>% 
  filter(percent == max(percent))
# Source: local data frame [2 x 3]
# Groups: fuel [2]
# 
#     fuel  year percent
#   (fctr) (dbl)   (dbl)
# 1   Coal  2000      40
# 2    Gas  1998      80

或使用 top_n （类似的结果为 filter（percent == max（percent）））

Or using top_n (similar result to filter(percent == max(percent)))

df %>% 
  group_by(fuel) %>% 
  top_n(n = 1, percent) # If percent is always the last column, you can just do top_n(n = 1)

# Source: local data frame [2 x 3]
# Groups: fuel [2]
# 
#     fuel  year percent
#   (fctr) (dbl)   (dbl)
# 1   Coal  2000      40
# 2    Gas  1998      80

或使用总结和 left_join （与上述两个相似的结果）

Or using summarise and left_join (similar result as in the two above)

df %>% 
  group_by(fuel) %>%
  summarise(percent = max(percent)) %>%
  left_join(., df)

# Joining by: c("fuel", "percent")
# Source: local data frame [2 x 3]
# 
#     fuel percent  year
#   (fctr)   (dbl) (dbl)
# 1   Coal      40  2000
# 2    Gas      80  1998

这篇关于“D”中的“选择A，B，max（C）”的dplyr成语“C”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

“D”中的“选择A，B，max（C）”的dplyr成语“C” [英] dplyr idiom for "select A, B, max(C) from D group by C"

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

“D”中的“选择A，B，max（C）”的dplyr成语“C” [英] dplyr idiom for &quot;select A, B, max(C) from D group by C&quot;

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

“D”中的“选择A，B，max（C）”的dplyr成语“C” [英] dplyr idiom for "select A, B, max(C) from D group by C"

登录关闭