在 dplyr 中使用变量列名汇总 [英] using variable column names in dplyr summarise
问题描述
我发现这个问题已经问过了,但没有正确答案.R 在 dplyr 的汇总函数中使用变量列名一个>
I found this question already asked but without proper answer. R using variable column names in summarise function in dplyr
我想计算两列平均值的差值,但是列名应该是由变量提供的...到目前为止我发现只有函数as.name
将列名提供为文本,但这不知何故在这里不起作用......
I want to calculate the difference between two column means, but the column name should be provided by variables... So far I found only the function as.name
to provide column names as text, but this somehow doesn't work here...
修复列名就可以了.
x <- c('a','b')
df <- group_by(data.frame(a=c(1,2,3,4), b=c(2,3,4,5), c=c(1,1,2,2)), c)
df %>% summarise(mean(a) - mean(b))
对于可变列,它不起作用
With variable columns, it doesn't work
df %>% summarise(mean(x[1]) - mean(x[2]))
df %>% summarise(mean(as.name(x[1])) - mean(as.name(x[2])))
由于 3 年前就有人问过这个问题,而且 dplyr
正在开发中,我想知道现在是否有答案.
Since this was asked already 3 years ago and dplyr
is under good development, I am wondering if there is an answer to this now.
推荐答案
你可以使用 base::get
:
df %>% summarise(mean(get(x[1])) - mean(get(x[2])))
# # A tibble: 2 x 2
# c `mean(a) - mean(b)`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
get
默认会在当前环境中搜索.
get
will search in current environment by default.
正如错误消息所说,mean
需要一个逻辑或数字对象,as.name
返回一个名称:
As the error message says, mean
expects a logical or numeric object, as.name
returns a name:
class(as.name("a")) # [1] "name"
你可以评估你的名字,这也可以:
You could evaluate your name, that would work as well :
df %>% summarise(mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2]))))
# # A tibble: 2 x 2
# c `mean(eval(as.name(x[1]))) - mean(eval(as.name(x[2])))`
# <dbl> <dbl>
# 1 1 -1
# 2 2 -1
这篇关于在 dplyr 中使用变量列名汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!