将参数传递给 dplyr 汇总函数 [英] Passing arguments to dplyr summarize function

查看:17
本文介绍了将参数传递给 dplyr 汇总函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 dplyr 中的汇总函数来计算汇总统计数据,该函数使用两个参数函数从连接的数据库中传递表和字段名称.不幸的是,一旦我用另一个函数包装汇总函数,结果就不正确.结束表是一个不遍历每一行的数据帧.我将在下面显示输入/输出:

I am trying to use the summarize function within dplyr to calculate summary statistics using a two argument function that passes a table and field name from a connected database. Unfortunately as soon as I wrap the summarize function with another function the results aren't correct. The end table is a dataframe that does not iterate through each row. I'll show the input/output below:

汇总统计函数图书馆(dplyr)

data<-iris
data<- group_by(.data = data,Species)

SummaryStatistics <- function(table, field){
table %>%
summarise(count = n(),
          min = min(table[[field]], na.rm = T),
          mean = mean(table[[field]], na.rm = T, trim=0.05),
          median = median(table[[field]], na.rm = T))
}

SummaryStatistics(data, "Sepal.Length")

输出表--不正确,只是重复相同的计算

     Species count   min     mean median
1     setosa    50   4.3 5.820588    5.8
2 versicolor    50   4.3 5.820588    5.8
3  virginica    50   4.3 5.820588    5.8

正确的表格/期望的结果--表格应该是这样的.当我运行超大包装函数的汇总函数时,这就是它产生的结果.

Correct Table/Desired Outcome--This is what the table should look like. When I run the summarize function outsize of the wrapper function, this is what it produces.

      Species count   min     mean median
 1     setosa    50   4.3 5.002174    5.0
 2 versicolor    50   4.9 5.934783    5.9
 3  virginica    50   4.9 6.593478    6.5

我希望这很容易理解.我只是不明白为什么汇总统计数据在包装函数之外完美地工作,但是一旦我将参数传递给它,它就会为每一行计算相同的东西.任何帮助将不胜感激.

I hope this is easy to understand. I just can't grasp as to why the summary statistics work perfectly outside of the wrapper function, but as soon as I pass arguments to it, it will calculate the same thing for each row. Any help would be greatly appreciated.

谢谢,凯夫

推荐答案

您需要使用非标准评估 (NSE) 以编程方式使用 dplyr 函数和 lazyeval.dplyr NSE 小插图 很好地涵盖了它.

You need to use Non-Standard Evaluation (NSE) to use dplyr functions programmatically alongside lazyeval. The dplyr NSE vignette covers it fairly well.

library(dplyr)
library(lazyeval)

data <- group_by(iris, Species)

SummaryStatistics <- function(table, field){
  table %>%
    summarise_(count = ~n(),
              min = interp(~min(var, na.rm = T), var = as.name(field)),
              mean = interp(~mean(var, na.rm = T, trim=0.05), var = as.name(field)),
              median = interp(~median(var, na.rm = T), var = as.name(field)))
}

SummaryStatistics(data, "Sepal.Length")

# A tibble: 3 × 5
     Species count   min     mean median
      <fctr> <int> <dbl>    <dbl>  <dbl>
1     setosa    50   4.3 5.002174    5.0
2 versicolor    50   4.9 5.934783    5.9
3  virginica    50   4.9 6.593478    6.5

这篇关于将参数传递给 dplyr 汇总函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆