在函数中使用 dplyr,非标准评估 [英] Using dplyr within a function, non-standard evaluation

查看:13
本文介绍了在函数中使用 dplyr,非标准评估的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图解决非标准评估为由 dplyr 使用,但没有成功.我想要一个简短的函数,它可以返回一组指定变量的汇总统计数据(N、平均值、标准差、中值、IQR、最小值、最大值).

Trying to get my head around Non-Standard Evaluation as used by dplyr but without success. I'd like a short function that returns summary statistics (N, mean, sd, median, IQR, min, max) for a specified set of variables.

我的函数的简化版...

Simplified version of my function...

my_summarise <- function(df = temp,
                         to.sum = 'eg1',
                         ...){
    ## Summarise
    results <- summarise_(df,
                          n = ~n(),
                          mean = mean(~to.sum, na.rm = TRUE))
    return(results)
}

并使用一些虚拟数据运行它...

And running it with some dummy data...

set.seed(43290)
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
              rnorm(n = 100, mean = 3, sd = 6)) %>% as.data.frame()
names(temp) <- c('eg1', 'eg2')
mean(temp$eg1)
  [1] 1.881721
mean(temp$eg2)
  [1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
    n mean
1 100   NA

N 是计算出来的,但是平均值不是,不知道为什么.

N is calculated, but the mean is not, can't figure out why.

最终我希望我的函数更通用,沿着......

Ultimately I'd like my function to be more general, along the lines of...

my_summarise <- function(df = temp,
                         group.by = 'group'
                         to.sum = c('eg1', 'eg2'),
                         ...){
    results <- list()
    ## Select columns
    df <- dplyr::select_(df, .dots = c(group.by, to.sum))
    ## Summarise overall
    results$all <- summarise_each(df,
                                  funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))
    ## Summarise by specified group
    results$by.group <- group_by_(df, ~to.group) %>%
                        summarise_each(df,
                                       funs(n = ~n(),
                                       mean = mean(~to.sum, na.rm = TRUE)))        
    return(results)
}

...但在我转向这个更复杂的版本之前(我使用的是 这个例子作为指导)我需要首先在简单版本中进行评估,因为那是绊脚石,对 dplyr::select() 的调用可以正常工作.

...but before I move onto this more complex version (which I was using this example for guidance) I need to get the evaluation working in the simple version first as thats the stumbling block, the call to dplyr::select() works ok.

感谢任何关于我哪里出错的建议.

Appreciate any advice as to where I'm going wrong.

提前致谢

推荐答案

基本思想是您必须自己实际构建适当的调用,使用 lazyeval 包最容易完成.

The basic idea is that you have to actually build the appropriate call yourself, most easily done with the lazyeval package.

在这种情况下,您希望以编程方式创建一个类似于 ~mean(eg1, na.rm = TRUE) 的调用.方法如下:

In this case you want to programmatically create a call that looks like ~mean(eg1, na.rm = TRUE). This is how:

my_summarise <- function(df = temp,
                         to.sum = 'eg1',
                         ...){
  ## Summarise
  results <- summarise_(df,
                        n = ~n(),
                        mean = lazyeval::interp(~mean(x, na.rm = TRUE),
                                                x = as.name(to.sum)))
  return(results)
}

当我努力让事情顺利进行时,我会这样做:

Here is what I do when I struggle to get things working:

  1. 请记住,就像您已经拥有的 ~n() 一样,调用必须以 ~ 开头.
  2. 用实际变量编写正确的调用,看看它是否有效(~mean(eg1, na.rm = TRUE)).
  3. 使用 lazyeval::interp 重新创建该调用,并通过仅运行 interp 以直观地查看它在做什么来检查这一点.
  1. Remember that, just like the ~n() you already have, the call will have to start with a ~.
  2. Write the correct call with the actual variable and see if it works (~mean(eg1, na.rm = TRUE)).
  3. Use lazyeval::interp to recreate that call, and check this by running only the interp to visually see what it is doing.

在这种情况下,我可能会经常写 interp(~mean(x, na.rm = TRUE), x = to.sum).但是运行它会给我们 ~mean("eg1", na.rm = TRUE) ,它将 eg1 视为字符而不是变量名.所以我们使用 as.name,正如 vignette("nse") 中教给我们的那样.

In this case I would probably often write interp(~mean(x, na.rm = TRUE), x = to.sum). But running that will give us ~mean("eg1", na.rm = TRUE) which is treating eg1 as a character instead of a variable name. So we use as.name, as is taught to us in vignette("nse").

这篇关于在函数中使用 dplyr,非标准评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆