传递参数给dplyr总结功能 [英] Passing arguments to dplyr summarize function

查看:190
本文介绍了传递参数给dplyr总结功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用dplyr中的summaryize函数来计算汇总统计信息,使用一个从连接的数据库传递表和字段名的两个参数函数。不幸的是,一旦我用另一个函数包含总结函数,结果是不正确的。结束表是不会遍历每一行的数据帧。我将在下面显示输入/输出:



汇总统计功能
库(dplyr)

  data< -iris 
data< - group_by(.data = data,Species)

SummaryStatistics& (table,field){
table%>%
summarize(count = n(),
min = min(table [[field]],na.rm = T),
mean = mean(table [[field]],na.rm = T,trim = 0.05),
median = median(table [[field]],na.rm = T))
}

SummaryStatistics(data,Sepal.Length)

输出表 - 不正确,只是重复相同的计算

 物种count min平均中位数
1 setosa 50 4.3 5.820588 5.8
2 versicolor 50 4.3 5.820588 5.8
3 virginica 50 4.3 5.820588 5.8

正确的表/期望的结果 - 这是表应该loo k喜欢当我运行包装函数的总结功能时,这是它产生的。

 物种数最小平均中位数
1 setosa 50 4.3 5.002174 5.0
2 versicolor 50 4.9 5.934783 5.9
3 virginica 50 4.9 6.593478 6.5

我希望这样很容易理解。我无法掌握为什么汇总统计信息在包装函数之外完美工作,但是一旦我传递参数,它将为每一行计算相同的事情。任何帮助将不胜感激。谢谢,Kev

解决方案

您需要使用标准评估使用 dplyr 函数以 lazyeval 方式编程。 dplyr NSE小插曲覆盖得相当好。

 库(dplyr)
库(lazyeval)

data< - group_by(iris,Species)

SummaryStatistics< - function(table,field){
table%>%
summarise_ count =〜n(),
min = interp(〜min(var,na.rm = T),var = as.name(field)),
mean = interp n = r,= n,r),var = as.name(field)),
median = )
}

摘要统计数据(数据,Sepal.Length)

#A tibble:3×5
物种数最小平均中位数
< fctr> < INT> < DBL> < DBL> < DBL>
1 setosa 50 4.3 5.002174 5.0
2 versicolor 50 4.9 5.934783 5.9
3 virginica 50 4.9 6.593478 6.5


I am trying to use the summarize function within dplyr to calculate summary statistics using a two argument function that passes a table and field name from a connected database. Unfortunately as soon as I wrap the summarize function with another function the results aren't correct. The end table is a dataframe that does not iterate through each row. I'll show the input/output below:

Summary Statistics Function library(dplyr)

data<-iris
data<- group_by(.data = data,Species)

SummaryStatistics <- function(table, field){
table %>%
summarise(count = n(),
          min = min(table[[field]], na.rm = T),
          mean = mean(table[[field]], na.rm = T, trim=0.05),
          median = median(table[[field]], na.rm = T))
}

SummaryStatistics(data, "Sepal.Length")

Output Table--Incorrect, it's just repeating the same calculation

     Species count   min     mean median
1     setosa    50   4.3 5.820588    5.8
2 versicolor    50   4.3 5.820588    5.8
3  virginica    50   4.3 5.820588    5.8

Correct Table/Desired Outcome--This is what the table should look like. When I run the summarize function outsize of the wrapper function, this is what it produces.

      Species count   min     mean median
 1     setosa    50   4.3 5.002174    5.0
 2 versicolor    50   4.9 5.934783    5.9
 3  virginica    50   4.9 6.593478    6.5

I hope this is easy to understand. I just can't grasp as to why the summary statistics work perfectly outside of the wrapper function, but as soon as I pass arguments to it, it will calculate the same thing for each row. Any help would be greatly appreciated.

Thanks, Kev

解决方案

You need to use standard evaluation to use dplyr functions programmatically alongside lazyeval. The dplyrNSE vignette covers it fairly well.

library(dplyr)
library(lazyeval)

data <- group_by(iris, Species)

SummaryStatistics <- function(table, field){
  table %>%
    summarise_(count = ~n(),
              min = interp(~min(var, na.rm = T), var = as.name(field)),
              mean = interp(~mean(var, na.rm = T, trim=0.05), var = as.name(field)),
              median = interp(~median(var, na.rm = T), var = as.name(field)))
}

SummaryStatistics(data, "Sepal.Length")

# A tibble: 3 × 5
     Species count   min     mean median
      <fctr> <int> <dbl>    <dbl>  <dbl>
1     setosa    50   4.3 5.002174    5.0
2 versicolor    50   4.9 5.934783    5.9
3  virginica    50   4.9 6.593478    6.5

这篇关于传递参数给dplyr总结功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆