使用purrr和dplyr:是rlang :: sym的最佳方法 [英] Using purrr and dplyr: is rlang::sym the best way

查看:58
本文介绍了使用purrr和dplyr:是rlang :: sym的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写使用dplyr动词的函数,这意味着我不得不涉猎 rlang 的阴暗水域.

I'd like to write functions that use dplyr verbs, which means I have to wade into the murky waters of rlang.

要提供一个具体的示例,请说我想使用 purrr :: map_df()遍历 dplyr :: group_by()中的变量.用dplyr编程小插图逐步介绍了 my_summarise()函数;这种方法是在分组变量上使用 rlang :: enquo(),然后用 !! 取消引用.这种方法可以创建一个新的类似于dplyr的函数,该函数采用未加引号的变量名(小插图中的 my_summarise(df,g1)).

To provide a concrete example, say I want to use purrr::map_df() to iterate over variables in a dplyr::group_by(). The programming with dplyr vignette walks through writing a my_summarise() function; the approach there is to use rlang::enquo() on the grouping variable, then unquote with !!. This approach works to make a new dplyr-like function that takes unquoted variable names (my_summarise(df, g1) in the vignette).

相比之下,我想提供将变量名作为字符串提供的功能. rlang :: sym()是正确的方法吗?似乎不是,因为在dplyr编程小插图中没有提到 sym(),在rlang

In contrast, I want to purrr provide the variable name as a string. Is rlang::sym() the right way to do this? It seems like it isn't, because sym() isn't mentioned in the dplyr programming vignette and barely mentioned in the rlang tidy evaluation article. Is there a better way?

library(tidyverse)
my_summarise <- function(df, group_var) {
  group_var <- rlang::sym(group_var)

  df %>%
    group_by(!!group_var) %>%
    summarise(mpg = mean(mpg))
}

# This works. Is that a good thing?
purrr::map_df(c("cyl", "am"), my_summarise, df = mtcars)

# A tibble: 5 x 3
    cyl   mpg    am
  <dbl> <dbl> <dbl>
1  4.00  26.7 NA   
2  6.00  19.7 NA   
3  8.00  15.1 NA   
4 NA     17.1  0   
5 NA     24.4  1.00

作为后续,为什么在某些时候简单地取消引用(不首先应用 enquo sym )?在下面的示例中,为什么 select()可以按预期工作,但 group_by()却不能正常工作?

As a follow-up, why does simply unquoting (without first applying enquo or sym) work some of the time? In the example below, why does select() work as expected but group_by() doesn't?

x <- "cyl"
select(mtcars, !!x)
group_by(mtcars, !!x)

更新:答案不是关于取消报价.只是 select 更灵活并且可以处理字符串,而 group_by 则不能.

Update: the answer is not about unquoting. It's that select is more flexible and can handle strings, while group_by can't.

其他参考:埃德温·托恩(Edwin Thoen)的博客帖子.

Other ref: This blog post by Edwin Thoen.

推荐答案

简短答案:是.

如果要在列上进行 map 映射,则 sym 是一种很好的方法.莱昂内尔·亨利(Lionel Henry)在草稿小插图中展示了 sym .

If you want to map over columns, sym is a fine way to do it. Lionel Henry demonstrates sym in the draft vignette.

如果您想传递列名,但又不想进行迭代,则KirillMüller更喜欢 quo .在下面的示例中,它们具有相同的效果.

In cases where you want to pass a column name, but aren't trying to iterate, Kirill Müller prefers quo. In the example below, they have the same effect.

library(dplyr)

x <- rlang::quo(cyl)
y <- rlang::sym("cyl")
identical(group_by(mtcars, !!x), group_by(mtcars, !!y))  # TRUE

这篇关于使用purrr和dplyr:是rlang :: sym的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆