dplyr 中的标准评估:总结作为字符串给出的变量 [英] standard evaluation in dplyr: summarise a variable given as a character string
问题描述
2020 年 7 月更新:
dplyr
1.0 几乎改变了关于这个问题的所有内容以及所有答案.在此处查看 dplyr
编程小插图:
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
当列的标识符存储为字符向量时,引用列的新方法是使用 rlang
中的 .data
代词,然后像在子集一样使用基础 R.
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19
如果你的代码在一个包函数中,你可以@importFrom rlang .data
避免 R 检查关于未定义全局变量的注释.
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
原始问题:
我想在 summarise
中引用一个未知的列名.dplyr 0.3
中引入的标准评估函数允许使用变量引用列名,但是当您在例如内部调用 base
R 函数时,这似乎不起作用一个总结
.
I want to refer to an unknown column name inside a summarise
. The standard evaluation functions introduced in dplyr 0.3
allow column names to be referenced using variables, but this doesn't appear to work when you call a base
R function within e.g. a summarise
.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- data_frame(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df 看起来像这样:
> df
Source: local data frame [5 x 3]
v1 v2 v3
1 1 6 A
2 2 7 A
3 3 8 A
4 4 9 B
5 5 10 B
我想去掉 v1,按 v3 分组,然后对每个组求和 v2:
I want to drop v1, group by v3, and sum v2 for each group:
df %>% select(-matches(drp)) %>% group_by_(key) %>% summarise_(sum(val, na.rm = TRUE))
Error in sum(val, na.rm = TRUE) : invalid 'type' (character) of argument
select()
的 NSE 版本工作正常,因为它可以匹配一个字符串.group_by()
的 SE 版本工作正常,因为它现在可以接受变量作为参数并评估它们.但是,在 dplyr
函数中使用基本 R 函数时,我还没有找到实现类似结果的方法.
The NSE version of select()
works fine, since it can match a character string. The SE version of group_by()
works fine, since it can now accept variables as arguments and evaluate them. However, I haven't found a way to achieve similar results when using base R functions inside dplyr
functions.
行不通的事情:
df %>% group_by_(key) %>% summarise_(sum(get(val), na.rm = TRUE))
Error in get(val) : object 'v2' not found
df %>% group_by_(key) %>% summarise_(sum(eval(as.symbol(val)), na.rm = TRUE))
Error in eval(expr, envir, enclos) : object 'v2' not found
我检查了几个 相关 问题,但到目前为止,没有提出的解决方案对我有用.
I've checked out several related questions, but none of the proposed solutions have worked for me so far.
推荐答案
dplyr
1.0 几乎改变了关于这个问题的所有内容以及所有答案.在此处查看 dplyr
编程小插图:
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
当列的标识符存储为字符向量时,引用列的新方法是使用 rlang
中的 .data
代词,然后像在子集一样使用基础 R.
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19
如果你的代码在一个包函数中,你可以@importFrom rlang .data
避免 R 检查关于未定义全局变量的注释.
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
这篇关于dplyr 中的标准评估:总结作为字符串给出的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!