如何将多个 group_by 参数和动态变量参数传递给 dplyr 函数 [英] How to pass multiple group_by arguments and a dynamic variable argument to a dplyr function

查看:19
本文介绍了如何将多个 group_by 参数和动态变量参数传递给 dplyr 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将多个 group_by 参数传递给 dplyr 函数以及命名变量.理解我需要使用 dplyr 的 quosure 来理解我传递给它的变量.以下代码工作正常:

quantileMaker2 <- function(data, groupCol, calcCol) {groupCol <- enquo(groupCol)calcCol <- enquo(calcCol)数据%>%group_by(!! groupCol) %>%总结('25%' =货币(分位数(!!calcCol,概率= 0.25),数字= 2L),'50%' = 货币(分位数(!!calcCol,概率 = 0.50),数字 = 2L),'75%' =货币(分位数(!!calcCol,概率= 0.75),数字= 2L),avg = 货币(平均值(!!calcCol),数字 = 2L),nAgencies = n_distinct('POSIT ID'),nFTE = sum(FTE))}quantileMaker2(df,雇主类,TCCperFTE)

但是,当我运行以下命令时,我遇到了问题:

quantileMaker3 <- function(data,...,calcCol) {groupCol <- quos(...)calcCol <- quo(calcCol)数据%>%group_by(!!! groupCol) %>%总结('25%' =货币(分位数(!!calcCol,概率= 0.25),数字= 2L),'50%' = 货币(分位数(!!calcCol,概率 = 0.50),数字 = 2L),'75%' =货币(分位数(!!calcCol,概率= 0.75),数字= 2L),avg = 货币(平均值(!!calcCol),数字 = 2L),nAgencies = n_distinct('POSIT ID'),nFTE = sum(FTE))}

返回以下错误:

 summarise_impl(.data, dots) 中的错误:评估错误:anyNA() 应用于符号"类型的非(列表或向量).

示例数据:

Year OwnerClass TCCperFTE FTEs POSIT ID2014 一 5000 20 12014 2 1000 30 22015 一 15000 40 12015 2 50000 50 22016 一 100000 60 12016年二 500000 70 2

如果您能提供任何帮助,我们将不胜感激.

解决方案

您尚未提供示例数据,但您的函数在修改为使用 mtcars 数据框时可以工作.

图书馆(tidyverse)图书馆(可格式化)quantileMaker3 <- 函数(数据,calcCol,...){groupCol <- quos(...)calcCol <- enquo(calcCol)数据%>%group_by(!!!groupCol)%>%总结('25%' =货币(分位数(!!calcCol,概率= 0.25),数字= 2L),'50%' =货币(分位数(!!calcCol,概率= 0.50),数字= 2L),'75%' = 货币(分位数(!!calcCol,概率 = 0.75),数字 = 2L),avg = 货币(平均值(!!calcCol),数字 = 2L),nAgencies = n_distinct(cyl),nFTEs = sum(hp))}quantileMaker3(mtcars, mpg, cyl)

<块引用>

# tibble: 3 x 7cyl `25%` `50%` `75%` 平均 nAgencies nFTE<dbl><S3:可格式化的><S3:可格式化的><S3:可格式化的><S3:可格式化的><int><dbl>1 4. 22.80 美元 26.00 美元 30.40 美元 26.66 美元 1 909.2 6. 18.65 美元 19.70 美元 21.00 美元 19.74 美元 1 856.3 8. 14.40 美元 15.20 美元 16.25 美元 15.10 美元 1 2929.

具有多个分组参数:

quantileMaker3(mtcars, mpg, cyl, vs)

<块引用>

# tibble: 5 x 8# 组:cyl [?]cyl 与 `25%` `50%` `75%` 平均 nAgencies nFTE<dbl><dbl><S3:可格式化的><S3:可格式化的><S3:可格式化的><S3:可格式化的><int><dbl>1 4. 0. $26.00 $26.00 $26.00 $26.00 1 91.2 4. 1. $22.80 $25.85 $30.40 $26.73 1 818.3 6. 0. $20.35 $21.00 $21.00 $20.57 1 395.4 6. 1. 18.03 美元 18.65 美元 19.75 美元 19.12 美元 1 461.5 8. 0. 14.40 美元 15.20 美元 16.25 美元 15.10 美元 1 2929.

顺便说一下,您可以通过使用嵌套来避免多次调用分位数.如果任何输出列属于 formattable 类(这是 currency 函数返回的),这将不起作用,因此我更改了函数以创建字符串货币格式列.

quantileMaker3 <- function(data, calcCol, ..., quantiles=c(0.25,0.5,0.75)) {groupCol <- quos(...)calcCol <- enquo(calcCol)数据%>%group_by(!!!groupCol)%>%summarise(values = list(paste0("$", sprintf("%1.2f", quantile(!!calcCol, probs=quantiles)))),qnames = list(sprintf("%1.0f%%", 分位数*100)),nAgencies = n_distinct(cyl),nFTEs = sum(hp),avg = paste0("$", sprintf("%1.2f", mean(!!calcCol))))%>%取消嵌套 %>%传播(qnames,值)}quantileMaker3(mtcars, mpg, cyl, vs)

<块引用>

# tibble: 5 x 8# 组: cyl [3]cyl vs nAgencies nFTEs 平均`25%` `50%` `75%`<dbl><dbl><int><dbl><chr><chr><chr><chr>1 4. 0. 1 91. $26.00 $26.00 $26.00 $26.002 4. 1. 1 818. 26.73 美元 22.80 美元 25.85 美元 30.40 美元3 6. 0. 1 395. 20.57 美元 20.35 美元 21.00 美元 21.00 美元4 6. 1. 1 461. 19.12 美元 18.03 美元 18.65 美元 19.75 美元5 8. 0. 1 2929. 15.10 美元 14.40 美元 15.20 美元 16.25 美元

I am trying to pass multiple group_by arguments to a dplyr function as well as a named variable. In understand that I need to use a quosure for dplyr to understand the variables i am passing to it. The following code works fine:

quantileMaker2 <- function(data, groupCol, calcCol) {
  groupCol <- enquo(groupCol)
  calcCol <- enquo(calcCol)

  data %>%
    group_by(!! groupCol) %>%
      summarise('25%' = currency(quantile(!! calcCol, probs = 0.25), digits = 2L),
            '50%' = currency(quantile(!! calcCol, probs = 0.50), digits = 2L),
            '75%' = currency(quantile(!! calcCol, probs = 0.75), digits = 2L),
            avg = currency(mean(!! calcCol), digits = 2L),
            nAgencies = n_distinct('POSIT ID'), 
            nFTEs = sum(FTEs)
  )
}

quantileMaker2(df, employerClass, TCCperFTE)

However when I run the following I have a problem:

quantileMaker3 <- function(data,...,calcCol) {
  groupCol <- quos(...)
  calcCol <- quo(calcCol)

  data %>%
    group_by(!!! groupCol) %>%
    summarise('25%' = currency(quantile(!! calcCol, probs = 0.25), digits = 2L),
          '50%' = currency(quantile(!! calcCol, probs = 0.50), digits = 2L),
          '75%' = currency(quantile(!! calcCol, probs = 0.75), digits = 2L),
          avg = currency(mean(!! calcCol), digits = 2L),
          nAgencies = n_distinct('POSIT ID'), 
          nFTEs = sum(FTEs)
)
}

Which returns the following error:

 Error in summarise_impl(.data, dots) : 
  Evaluation error: anyNA() applied to non-(list or vector) of type 'symbol'. 

Sample data:

Year    employerClass   TCCperFTE   FTEs    POSIT ID
2014    One             5000        20      1
2014    Two             1000        30      2
2015    One             15000       40      1
2015    Two             50000       50      2
2016    One             100000      60      1
2016    Two             500000      70      2

Any help you guys could give would be much appreciated.

解决方案

You haven't provided sample data, but your function works when modified to use the mtcars data frame.

library(tidyverse)
library(formattable)

quantileMaker3 <- function(data, calcCol, ...) {
  groupCol <- quos(...)
  calcCol <- enquo(calcCol)

  data %>%
    group_by(!!!groupCol) %>%
    summarise('25%' = currency(quantile(!!calcCol, probs = 0.25), digits = 2L),
              '50%' = currency(quantile(!!calcCol, probs = 0.50), digits = 2L),
              '75%' = currency(quantile(!!calcCol, probs = 0.75), digits = 2L),
              avg = currency(mean(!!calcCol), digits = 2L),
              nAgencies = n_distinct(cyl), 
              nFTEs = sum(hp)
    )
}

quantileMaker3(mtcars, mpg, cyl)

# A tibble: 3 x 7
    cyl `25%`             `50%`             `75%`             avg               nAgencies nFTEs
  <dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable>     <int> <dbl>
1    4. $22.80            $26.00            $30.40            $26.66                    1  909.
2    6. $18.65            $19.70            $21.00            $19.74                    1  856.
3    8. $14.40            $15.20            $16.25            $15.10                    1 2929.

With multiple grouping arguments:

quantileMaker3(mtcars, mpg, cyl, vs)

# A tibble: 5 x 8
# Groups:   cyl [?]
    cyl    vs `25%`             `50%`             `75%`             avg               nAgencies nFTEs
  <dbl> <dbl> <S3: formattable> <S3: formattable> <S3: formattable> <S3: formattable>     <int> <dbl>
1    4.    0. $26.00            $26.00            $26.00            $26.00                    1   91.
2    4.    1. $22.80            $25.85            $30.40            $26.73                    1  818.
3    6.    0. $20.35            $21.00            $21.00            $20.57                    1  395.
4    6.    1. $18.03            $18.65            $19.75            $19.12                    1  461.
5    8.    0. $14.40            $15.20            $16.25            $15.10                    1 2929.

Incidentally, you can avoid multiple calls to quantile by using nesting. This won't work if any of the output columns are of class formattable (which is what the currency function returns), so I've changed the function to create strings for the currency-format columns.

quantileMaker3 <- function(data, calcCol, ..., quantiles=c(0.25,0.5,0.75)) {

  groupCol <- quos(...)
  calcCol <- enquo(calcCol)

  data %>%
    group_by(!!!groupCol) %>%
    summarise(values = list(paste0("$", sprintf("%1.2f", quantile(!!calcCol, probs=quantiles)))),
              qnames = list(sprintf("%1.0f%%", quantiles*100)),
              nAgencies = n_distinct(cyl), 
              nFTEs = sum(hp),
              avg = paste0("$", sprintf("%1.2f", mean(!!calcCol)))
    ) %>% 
    unnest %>% 
    spread(qnames, values) 
}

quantileMaker3(mtcars, mpg, cyl, vs)

# A tibble: 5 x 8
# Groups:   cyl [3]
    cyl    vs nAgencies nFTEs avg    `25%`  `50%`  `75%` 
  <dbl> <dbl>     <int> <dbl> <chr>  <chr>  <chr>  <chr> 
1    4.    0.         1   91. $26.00 $26.00 $26.00 $26.00
2    4.    1.         1  818. $26.73 $22.80 $25.85 $30.40
3    6.    0.         1  395. $20.57 $20.35 $21.00 $21.00
4    6.    1.         1  461. $19.12 $18.03 $18.65 $19.75
5    8.    0.         1 2929. $15.10 $14.40 $15.20 $16.25

这篇关于如何将多个 group_by 参数和动态变量参数传递给 dplyr 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆