dplyr 0.6的即将发布将如何影响我的代码? [英] How will the upcoming release of dplyr 0.6 impact my code?

查看:100
本文介绍了dplyr 0.6的即将发布将如何影响我的代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下个月内,dplyr 0.6将被发布,其中包括一个主要大修用dplyr编程。我仔细阅读本文档,我正在努力了解它将如何影响我对dplyr的使用。



这是一个常见的成语,我在构建报告和聚合功能时使用dplyr:

  my_report<  -  function(data,grouping_vars){
data%>%
group_by _(。dots = grouping_vars)%>%
summaryize(x_mean = mean(x),x_median = median(x),...)
}

这里, grouping_vars 是一个字符串向量。



我喜欢这个成语,因为我可以传递来自其他地方的字符串向量,说一个文件或者闪亮的应用程序的反应式UI,但是对于交互式工作来说也是不错的。


但是,在新的使用dplyr vignette编程中,我看不到这样的例子可以用新的dplyr来完成。我只看到传递字符串不再是正确的方法的例子,而我必须使用quosure。



我很高兴采用quosure,但是怎么样我从字符串到dplyr这个预期的麻烦?希望整个R生态系统能够给dplyr提供麻烦似乎是不可行的 - 很多时候我们要得到字符串,必须转换。



这是一个示例,显示你现在应该做什么,以及我的老成语如何不起作用:

 图书馆(dplyr)
grouping_vars< - quo(am)
mtcars%>%
group_by(!! grouping_vars)%>%
总结(mean_cyl = ))
#> #A tibble:2×2
#> am mean_cyl
#> < DBL> < DBL>
#> 1 0 6.947368
#> 2 1 5.076923

grouping_vars< - am
mtcars%>%
group_by(!! grouping_vars)%>%
总结(mean_cyl = (cyl))
#> #a bibble:1×2
#> `am`mean_cyl
#> < CHR> < DBL>
#> 1 am 6.1875


解决方案

dplyr 将有一个专门的group_by函数 group_by_at 来处理多个分组变量。使用 _at 系列的新成员将会更容易:

 #使用预发行版0.6.0 

cols< - c(am,gear)

mtcars%>%
group_by_at(.vars = cols)%>%
总结(mean_cyl = mean(cyl))

#来源:本地数据框[4 x 3]
# am [?]

#am gear mean_cyl
#< dbl> < DBL> < DBL>
#1 0 3 7.466667
#2 0 4 5.000000
#3 1 4 4.500000
#4 1 5 6.000000

.vars 参数接受由 vars生成的字符/数字向量或列名称


.vars



由vars()生成的列,或
列名称的字符向量或列位置的数字向量。



Within the next month dplyr 0.6 is going to be released, including a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.

Here is a common idiom I use when building reporting and aggregation functions with dplyr:

my_report <- function(data, grouping_vars) {
  data %>%
    group_by_(.dots=grouping_vars) %>%
    summarize(x_mean=mean(x), x_median=median(x), ...)
}

Here, grouping_vars is a vector of strings.

I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.

However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.

I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.

Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:

library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#>      am mean_cyl
#>   <dbl>    <dbl>
#> 1     0 6.947368
#> 2     1 5.076923

grouping_vars <- "am"
mtcars %>%
  group_by(!!grouping_vars) %>%
  summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#>   `"am"` mean_cyl
#>    <chr>    <dbl>
#> 1     am   6.1875

解决方案

dplyr will have a specialized group_by function group_by_at to deal with multiple grouping variables. It would be much easier to use the new member of the _at family:

# using the pre-release 0.6.0

cols <- c("am","gear")

mtcars %>%
    group_by_at(.vars = cols) %>%
    summarise(mean_cyl=mean(cyl))

# Source: local data frame [4 x 3]
# Groups: am [?]
# 
# am  gear mean_cyl
# <dbl> <dbl>    <dbl>
# 1     0     3 7.466667
# 2     0     4 5.000000
# 3     1     4 4.500000
# 4     1     5 6.000000

The .vars argument accepts both character/numeric vector or column names generated by vars:

.vars

A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.

这篇关于dplyr 0.6的即将发布将如何影响我的代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆