dplyr 0.6的即将发布将如何影响我的代码? [英] How will the upcoming release of dplyr 0.6 impact my code?
问题描述
在下个月内,dplyr 0.6将被发布,其中包括一个主要大修用dplyr编程。我仔细阅读本文档,我正在努力了解它将如何影响我对dplyr的使用。
这是一个常见的成语,我在构建报告和聚合功能时使用dplyr:
my_report< - function(data,grouping_vars){
data%>%
group_by _(。dots = grouping_vars)%>%
summaryize(x_mean = mean(x),x_median = median(x),...)
}
这里, grouping_vars
是一个字符串向量。
我喜欢这个成语,因为我可以传递来自其他地方的字符串向量,说一个文件或者闪亮的应用程序的反应式UI,但是对于交互式工作来说也是不错的。
但是,在新的使用dplyr vignette编程中,我看不到这样的例子可以用新的dplyr来完成。我只看到传递字符串不再是正确的方法的例子,而我必须使用quosure。
我很高兴采用quosure,但是怎么样我从字符串到dplyr这个预期的麻烦?希望整个R生态系统能够给dplyr提供麻烦似乎是不可行的 - 很多时候我们要得到字符串,必须转换。
这是一个示例,显示你现在应该做什么,以及我的老成语如何不起作用:
图书馆(dplyr)
grouping_vars< - quo(am)
mtcars%>%
group_by(!! grouping_vars)%>%
总结(mean_cyl = ))
#> #A tibble:2×2
#> am mean_cyl
#> < DBL> < DBL>
#> 1 0 6.947368
#> 2 1 5.076923
grouping_vars< - am
mtcars%>%
group_by(!! grouping_vars)%>%
总结(mean_cyl = (cyl))
#> #a bibble:1×2
#> `am`mean_cyl
#> < CHR> < DBL>
#> 1 am 6.1875
dplyr
将有一个专门的group_by函数 group_by_at
来处理多个分组变量。使用 _at
系列的新成员将会更容易:
#使用预发行版0.6.0
cols< - c(am,gear)
mtcars%>%
group_by_at(.vars = cols)%>%
总结(mean_cyl = mean(cyl))
#来源:本地数据框[4 x 3]
# am [?]
#
#am gear mean_cyl
#< dbl> < DBL> < DBL>
#1 0 3 7.466667
#2 0 4 5.000000
#3 1 4 4.500000
#4 1 5 6.000000
.vars
参数接受由 vars生成的字符/数字向量或列名称
:
.vars
由vars()生成的列,或
列名称的字符向量或列位置的数字向量。
Within the next month dplyr 0.6 is going to be released, including a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.
Here is a common idiom I use when building reporting and aggregation functions with dplyr:
my_report <- function(data, grouping_vars) {
data %>%
group_by_(.dots=grouping_vars) %>%
summarize(x_mean=mean(x), x_median=median(x), ...)
}
Here, grouping_vars
is a vector of strings.
I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.
However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.
I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.
Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:
library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#> am mean_cyl
#> <dbl> <dbl>
#> 1 0 6.947368
#> 2 1 5.076923
grouping_vars <- "am"
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#> `"am"` mean_cyl
#> <chr> <dbl>
#> 1 am 6.1875
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0
cols <- c("am","gear")
mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))
# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
The .vars
argument accepts both character/numeric vector or column names generated by vars
:
.vars
A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.
这篇关于dplyr 0.6的即将发布将如何影响我的代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!