具有函数列名称的dplyr [英] dplyr with name of columns in a function

查看:94
本文介绍了具有函数列名称的dplyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

无法弄清楚如何在使用 dplyr R 包的函数中使用列名。可重现的示例如下:

Not able to figure out how to use column names in a function using dplyr R package. Reproducible example is below:

数据

set.seed(12345)
Y <- rnorm(10)
Env <- paste0("E", rep(1:2, each = 5))
Gen <- paste0("G", rep(1:5, times = 2))
df1 <- data.frame(Y, Env, Gen)

外部功能工作

library(dplyr)
  df1 %>%
    dplyr::group_by(E, G) %>%
    dplyr::summarize(mean(Y))

with(data = df1, expr = tapply(X = Y, INDEX = list(E, G), FUN = mean))  

第一个函数

fn1 <- function(Y, E, G, data){
  Y <- deparse(substitute(Y))
  E <- deparse(substitute(E))
  G <- deparse(substitute(G))
  Out <- with(data = data, tapply(X = Y, INDEX = list(E, G), FUN = mean), parent.frame())
  return(Out)
}  

fn1(Y = Y, E = Env, G = Gen, data = df1)




tapply错误(X = Y,INDEX = list(E,G),FUN = Mean):参数
必须具有相同的长度

Error in tapply(X = Y, INDEX = list(E, G), FUN = mean) : arguments must have same length

第二功能

fn2 <- function(Y, E, G, data){
  Y <- deparse(substitute(Y))
  E <- deparse(substitute(E))
  G <- deparse(substitute(G))
  library(dplyr)
  Out <- df1 %>%
    dplyr::group_by(E, G) %>%
    dplyr::summarize(mean(Y))
  return(Out)
}  

fn2(Y = Y, E = Env, G = Gen, data = df1)




grouped_df_impl中的错误(data,unname (vars,drop):列 E
未知


推荐答案

一种选择是使用 enquo quosure 对象,可以在 group_by 摘要变异等,可使用 !! 运算符或 UQ (unquote expression)

One option would to use the enquo to capture the expression and its environment in a quosure object which can be evaluated within the group_by, summarise, mutate etc by using !! operator or UQ (unquote expression)

fn2 <- function(Y, E, G, data){
 E <- enquo(E)
 G <- enquo(G)
 Y <- enquo(Y)
 data %>%
    dplyr::group_by(!! E, !! G) %>%
    dplyr::summarize(Y = mean(!!Y))

}

fn2(Y, E = Env, G = Gen, df1)
# A tibble: 10 x 3
# Groups: Env [?]
#   Env    Gen         Y
#   <fctr> <fctr>  <dbl>
# 1 E1     G1      0.586
# 2 E1     G2      0.709
# 3 E1     G3     -0.109
# 4 E1     G4     -0.453
# 5 E1     G5      0.606
# 6 E2     G1     -1.82 
# 7 E2     G2      0.630
# 8 E2     G3     -0.276
# 9 E2     G4     -0.284
#10 E2     G5     -0.919






在Op的函数中,表达式由<$ c捕获$ c>替代,并用删除将其转换为字符串。通过使用 rlang 中的 sym ,可以将其转换为符号,然后使用进行评估!! UQ


In the Op's function, while the expression is captured by substitute, with deparse, it is converted to a string. By using sym from rlang, this can be converted to symbol and then evaluated with !! or UQ as above

fn2 <- function(Y, E, G, data){
   Y <- deparse(substitute(Y))
   E <- deparse(substitute(E))
   G <- deparse(substitute(G))

   df1 %>%
    dplyr::group_by(!!rlang::sym(E), !! rlang::sym(G)) %>%
    dplyr::summarize(Y = mean(!! rlang::sym(Y)))

}  

fn2(Y = Y, E = Env, G = Gen, data = df1)

OP函数的另一个变体而不使用 rlang 将使用 group_by_at summarise_at 可以将字符串作为参数

Another variant of the OP's function without using rlang would be to make use of group_by_at or summarise_at which can take strings as argument

fn3 <- function(Y, E, G, data){
  Y <- deparse(substitute(Y))
  E <- deparse(substitute(E))
  G <- deparse(substitute(G))

   df1 %>%
    dplyr::group_by_at(vars(E, G)) %>%
    dplyr::summarize_at(vars(Y), mean)

}  

fn3(Y = Y, E = Env, G = Gen, data = df1)

这篇关于具有函数列名称的dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆