通过带有无引号的元素的显式参数指定多个变量进行分组 [英] Specifying multiple variables to group by via explicit argument with unquoted elements

查看:83
本文介绍了通过带有无引号的元素的显式参数指定多个变量进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于的部分 nofollow noreferrer>使用dplyr编程,我试图指定


  1. 多个变量按 dplyr :: group_by

而不依赖 ... ,但使用显式列表参数 group_vars

without relying on ... but using an explicit list argument group_vars instead

,而无需引用列表元素arg group_vars

without needing to quote the list elements in arg group_vars

示例数据

df <- tibble::tribble(
  ~a,   ~b,  ~c,
  "A",  "a", 10,
  "A",  "a", 20,
  "A",  "b", 1000,
  "B",  "a", 5,
  "B",  "b", 1
)

基于<$ c $的方法c> ... 来自编程与dplyr

# Approach 1 -----
my_summarise <- function(df, ...) {
  group_vars <- dplyr::enquos(...)

  df %>%
    dplyr::group_by(!!!group_vars) %>%
    dplyr::summarise(x = mean(c))
}

my_summarise(df, a, b)
#> # A tibble: 4 x 3
#> # Groups:   a [2]
#>   a     b         x
#>   <chr> <chr> <dbl>
#> 1 A     a        15
#> 2 A     b      1000
#> 3 B     a         5
#> 4 B     b         1

基于带引号元素的列表参数的方法:

Approach based on list argument with quoted elements:

# Approach 2 -----
my_summarise_2 <- function(df, group_vars = c("a", "b")) {
  group_vars <- dplyr::syms(group_vars)

  df %>%
    dplyr::group_by(!!!group_vars) %>%
    dplyr::summarise(x = mean(c))
}

my_summarise_2(df)
#> # A tibble: 4 x 3
#> # Groups:   a [2]
#>   a     b         x
#>   <chr> <chr> <dbl>
#> 1 A     a        15
#> 2 A     b      1000
#> 3 B     a         5
#> 4 B     b         1

my_summarise_2(df, group_vars = "a")
#> # A tibble: 2 x 2
#>   a         x
#>   <chr> <dbl>
#> 1 A      343.
#> 2 B        3

我找不到让我提供未加引号的列名的方法:

I can't find an approach that lets me supply unquoted column names:

# Approach 3 -----
my_summarise_3 <- function(df, group_vars = list(a, b)) {
  group_vars <- dplyr::enquos(group_vars)

  df %>%
    dplyr::group_by(!!!group_vars) %>%
    dplyr::summarise(x = mean(c))
}

my_summarise_3(df)
#> Error: Column `list(a, b)` must be length 5 (the number of rows) or one, not 2

我想关键的是要在调用 group_vars<-dplyr :: enquos(...)相同的列表结构。 c $ c>:

I guess the crucial thing is to end up with an identical list structure as the one after calling group_vars <- dplyr::enquos(...):

<list_of<quosure>>

[[1]]
<quosure>
expr: ^a
env:  global

[[2]]
<quosure>
expr: ^b
env:  global

我试图用 group_vars%>%purrr :: map(dplyr :: enquo),但是R当然会抱怨 a b ,因为需要对其进行评估。

I tried to tackle it with group_vars %>% purrr::map(dplyr::enquo), but of course R complains about a and b as they need to be evaluated.

推荐答案

主要问题是 list(a,b)不能捕获未评估的表达式 a b ,而是计算这些表达式并创建一个包含结果的两元素列表。您基本上有两个选择:

The main issue is that list(a, b) does not capture unevaluated expressions a and b, but instead evaluates those expressions and creates a two-element list with results. You basically have two options:

解决方案一:使用 rlang :: exprs()捕获实际表达式。由于表达式已经过计算,因此您不再需要在函数内部使用 enquos ,而只需将其变为

Solution one: Use rlang::exprs() to capture the actual expressions. Since the expressions are already unevaluated, you no longer need an enquos inside your function, which simply becomes

my_summarise_3 <- function(df, group_vars = rlang::exprs(a, b)) {
  df %>%
    dplyr::group_by(!!!group_vars) %>%
    dplyr::summarise(x = mean(c))
}

my_summarise_3(df)
# # A tibble: 4 x 3
# # Groups:   a [2]
#   a     b         x
#   <chr> <chr> <dbl>
# 1 A     a        15
# 2 A     b      1000
# 3 B     a         5
# 4 B     b         1

此界面的缺点是用户现在负责引用(即捕获其表达式)参数:

The down side of this interface is that the user is now responsible for quoting (i.e, capturing the expressions of) the arguments:

# Note that it can be done using quote() from base R
my_summarise_3(df, group_vars=quote(a))
# # A tibble: 2 x 2
#   a         x
#   <chr> <dbl>
# 1 A      343.
# 2 B        3 

解决方案两个:完整捕获未评估的表达式 list(a,b)并手动解析。

Solution two: Capture the unevaluated expression list(a,b) in its entirety and parse it by hand.

## Helper function to recursively construct an abstract syntax tree
getAST <- function( ee ) { as.list(ee) %>% map_if(is.call, getAST) }

my_summarise_3 <- function(df, group_vars = list(a,b)) {
  ## Capture the expression and parse it
  ast <- rlang::enexpr(group_vars) %>% getAST()

  ## Identify symbols present in the data
  gvars <- unlist(ast) %>% map_chr(deparse) %>%
      intersect(names(df)) %>% rlang::syms()

  df %>%
      dplyr::group_by(!!!gvars) %>%
      dplyr::summarise(x = mean(c))
}

my_summarise_3(df, list(a,b))
# # A tibble: 4 x 3
# # Groups:   a [2]
#   a     b         x
#   <chr> <chr> <dbl>
# 1 A     a        15
# 2 A     b      1000
# 3 B     a         5
# 4 B     b         1

my_summarise_3(df, b)
# # A tibble: 2 x 2
#   b         x
#   <chr> <dbl>
# 1 a      11.7
# 2 b     500. 

这篇关于通过带有无引号的元素的显式参数指定多个变量进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆