如何在dplyr 0.7中参数化函数调用? [英] How to parametrize function calls in dplyr 0.7?
问题描述
dplyr 0.7的发行版包括对dplyr进行编程的大修。我仔细阅读了本文档,并试图理解它会如何影响dplyr的使用。
The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.
以下是在使用以下功能构建报表和汇总功能时常用的习惯用法: dplyr:
Here is a common idiom I use when building reporting and aggregation functions with dplyr:
my_report <- function(data, grouping_vars) {
data %>%
group_by_(.dots=grouping_vars) %>%
summarize(x_mean=mean(x), x_median=median(x), ...)
}
在这里, grouping_vars
是字符串的向量。
Here, grouping_vars
is a vector of strings.
我喜欢这个习惯用法,因为我可以从其他地方传入字符串向量,例如文件或Shiny应用程序的反应式UI,但是对于交互式工作也不错。
I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.
但是,在新的使用dplyr小插图进行编程中,新的dplyr无法完成此类操作。我只看到传递字符串不再是正确方法的示例,而我不得不使用quosures。
However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.
我很乐意采用quosures,但是该怎么做呢?我从字符串到dplyr所期望的quoures?
I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.
这是一个示例,显示您现在应该做的事情以及我以前的成语如何不起作用:
Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:
library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#> am mean_cyl
#> <dbl> <dbl>
#> 1 0 6.947368
#> 2 1 5.076923
grouping_vars <- "am"
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#> `"am"` mean_cyl
#> <chr> <dbl>
#> 1 am 6.1875
推荐答案
dplyr
将具有专门的group_by函数 group_by_at
来处理多个分组变量。使用 _at
家族的新成员会容易得多:
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0
cols <- c("am","gear")
mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))
# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
.vars
参数接受由 vars $ c生成的字符/数字矢量或列名$ c>:
.vars
.vars
由vars()生成的列列表,或
列名或列位置的数字矢量。
A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.
这篇关于如何在dplyr 0.7中参数化函数调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!