“D”中的“选择A,B,max(C)”的dplyr成语“C” [英] dplyr idiom for "select A, B, max(C) from D group by C"
问题描述
我正在通过具有多个结果列的查询寻找SQL组的dplyr成语。例如:
I am looking for a dplyr idiom for SQL group by queries with several result columns. For example:
library(dplyr)
library(sqldf)
df <- data.frame(
fuel=rep(c("Coal", "Gas"), each=3),
year=rep(c(1998,1999,2000), 2),
percent=c(20,30,40,80,70,60))
sqldf("select fuel, year, max(percent) from df group by fuel")
fuel year max(percent)
1 Coal 2000 40
2 Gas 1998 80
sqldf提供给定燃料达到最大百分比(忽视关系)的年份。使用 dplyr
的最佳方法是什么?简单地说:
The sqldf supplies the year that a given fuel reached it's maximum percentage (ignoring ties). What is the best way to do this using dplyr
? Simply doing:
group_by(df,fuel) %>% summarise(max(percent))
给出:
fuel max(percent)
1 Coal 40
2 Gas 80
似乎不是添加额外结果列的地方。我可以使用 mutate
间接地执行此操作:
and there does not seem to be a place to add an extra result column. I can do it indirectly by using mutate
:
group_by(df,fuel) %>% mutate(maxp=max(percent)) %>%
filter(percent==maxp) %>% select(-percent)
这是最好的/唯一的方式吗?
Is that the best/only way?
推荐答案
一些更多选项
使用 distinct
(这与 slice(which.max (%))
,但是由组操作避免,因此可能更有效)
Using distinct
(this is similar to slice(which.max(percent))
, but will avoid by group operations and hence probably more efficient)
df %>%
arrange(desc(percent)) %>%
distinct(fuel)
# fuel year percent
# 1 Gas 1998 80
# 2 Coal 2000 40
或使用过滤器
(这将选择全部具有最大值的行)
Or using filter
(this will select all the rows having a maxima)
df %>%
group_by(fuel) %>%
filter(percent == max(percent))
# Source: local data frame [2 x 3]
# Groups: fuel [2]
#
# fuel year percent
# (fctr) (dbl) (dbl)
# 1 Coal 2000 40
# 2 Gas 1998 80
或使用 top_n
(类似的结果为 filter(percent == max(percent))
)
Or using top_n
(similar result to filter(percent == max(percent))
)
df %>%
group_by(fuel) %>%
top_n(n = 1, percent) # If percent is always the last column, you can just do top_n(n = 1)
# Source: local data frame [2 x 3]
# Groups: fuel [2]
#
# fuel year percent
# (fctr) (dbl) (dbl)
# 1 Coal 2000 40
# 2 Gas 1998 80
或使用总结
和 left_join
(与上述两个相似的结果)
Or using summarise
and left_join
(similar result as in the two above)
df %>%
group_by(fuel) %>%
summarise(percent = max(percent)) %>%
left_join(., df)
# Joining by: c("fuel", "percent")
# Source: local data frame [2 x 3]
#
# fuel percent year
# (fctr) (dbl) (dbl)
# 1 Coal 40 2000
# 2 Gas 80 1998
这篇关于“D”中的“选择A,B,max(C)”的dplyr成语“C”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!