在自定义dplyr函数中更改结果变量的名称 [英] Changing names of resulting variables in custom dplyr function
问题描述
背景
为了加快跨多个表格生成分组摘要,因为我在 dplyr
工作流程,我已经草拟了一个简单的函数来生成所需的度量标准
#函数生成汇总表
generate_summary_tbl < - 函数(数据集,group_column,summary_column){
group_column < - enquo(group_column)
summary_column < - enquo(summary_column)
dataset%>>%
其他指标需要其他指标,例如:group_by(!! group_column)%>%
summary(
mean = mean(!! summary_column),
sum = sum(!! summary_column)
#经常产生
)%>%
ungroup - > smryDta
return(smryDta)
}
示例
该功能可以根据需要运行:
>> mtcars%>%
... generate_summary_tbl(group_column = am,summary_column = mpg)
#一个tibble:2 x 3
平均值总和
< dbl> < DBL> < DBL>
1 0 17.14737 325.8
2 1 24.39231 317.1
问题
我希望 有条件在结果中包含通过 当使用 区别在于变量名 部分的,丑陋的解决方案我使用 返回的列名几乎 summary_column = mpg
传递的列的名称。
结果示例
useColName = TRUE
useColName = TRUE
调用时,结果应该对应于:
>> mtcars%>%
... generate_summary_tbl(group_column = am,summary_column = mpg,
useColName = TRUE)
#一个tibble:2 x 3
am mean_am sum_am
< dbl> < DBL> < DBL>
1 0 17.14737 325.8
2 1 24.39231 317.1
mean_am
等后缀中的 _am
后缀
丑陋的解决方案
setNames code $ c
$ b $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $数据集,
group_column,
summary_column,
useColName = TRUE){
group_column< - enquo(group_column)
summary_column< - enquo(summary_column)
数据集%>%
group_by(!! group_column)%>%
summary(mean = mean(!! summary_column),
sum = sum(!! summary_column))%> ;%
取消组合 - > smryDta
if(useColName){
setNames(smryDta,
c(deparse(substitute(
group_column
)),
paste(
名称(smryDta)[2:长度(smryDta)],paste0(_,deparse(替代(
group_column
)))
))) - > smryDta
}
return(smryDta)
}
< h3>示例
mtcars%>%
generate_summary_tbl(group_column = am ,summary_column = mpg,useColName = TRUE)
#A tibble:2 x 3
`〜am`` mean _〜am`` sum _〜am`
< DBL> < DBL>
1 0 17.14737 325.8
2 1 24.39231 317.1
我希望获得所需的列名,理想情况下可以更好地使用 quo
或 lazyeval
?
也许使用 rename
:
library(tidyverse)
generate_summary_tbl< - function(dataset ,group_column,summary_column,useColname = FALSE){
group_column < - enquo(group_column)
summary_column< - enquo(summary_column)
dataset%>%
group_by(! ($汇总列),
sum = sum(!! summary_column)
#其他需要属的指标特别频繁的
)%>%
ungroup - > smryDta
if(useColname)
smryDta< - smryDta%>%
rename_at(
vars(-one_of(quo_name(group_column))),
〜paste(quo_name(group_column),.x,sep =_)
)
return(smryDta)
}
mtcars %>%generate_summary_tbl(am,mpg)
##一个tibble:2 x 3
#均值和
#< dbl> < DBL> < DBL>
#1 0 17.14737 325.8
#2 1 24.39231 317.1
mtcars%>%generate_summary_tbl(am,mpg,T)
##一个tibble:2 x 3
#am_mean am_sum
#< dbl> < DBL> < DBL>
#1 0 17.14737 325.8
#2 1 24.39231 317.1
Background
In order to speed up generating grouped summaries across multiple tables; as I'm doing most of that while in dplyr
workflow, I've drafted a simple function that generates the desired metrics
# Function to generate summary table
generate_summary_tbl <- function(dataset, group_column, summary_column) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(
mean = mean(!!summary_column),
sum = sum(!!summary_column)
# Other metrics that need to be generated frequently
) %>%
ungroup -> smryDta
return(smryDta)
}
Example
The function works as desired:
>> mtcars %>%
... generate_summary_tbl(group_column = am, summary_column = mpg)
# A tibble: 2 x 3
am mean sum
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
Problem
I would like, conditionally include name of the column passed via summary_column = mpg
in the results.
Example results, useColName = TRUE
When called with useColName = TRUE
the results should correspond to:
>> mtcars %>%
... generate_summary_tbl(group_column = am, summary_column = mpg,
useColName = TRUE)
# A tibble: 2 x 3
am mean_am sum_am
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
The difference is presence of the _am
suffix in the variable names mean_am
and so on.
Ugly solution
Partial, ugly solution I have uses setNames
:
# Function to generate summary table
generate_summary_tbl <-
function(dataset,
group_column,
summary_column,
useColName = TRUE) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(mean = mean(!!summary_column),
sum = sum(!!summary_column)) %>%
ungroup -> smryDta
if (useColName) {
setNames(smryDta,
c(deparse(substitute(
group_column
)),
paste(
names(smryDta)[2:length(smryDta)], paste0("_", deparse(substitute(
group_column
)))
))) -> smryDta
}
return(smryDta)
}
Example
The returned column names, almost match the desired results. I reckon I could employ some regex and arrive at the desired results. However, I reckon that more efficient solutions should be available.
mtcars %>%
generate_summary_tbl(group_column = am, summary_column = mpg, useColName = TRUE)
# A tibble: 2 x 3
`~am` `mean _~am` `sum _~am`
<dbl> <dbl> <dbl>
1 0 17.14737 325.8
2 1 24.39231 317.1
How can I get desired column names, ideally making better use of quo
or lazyeval
?
Maybe use rename
:
library(tidyverse)
generate_summary_tbl <- function(dataset, group_column, summary_column, useColname = FALSE) {
group_column <- enquo(group_column)
summary_column <- enquo(summary_column)
dataset %>%
group_by(!!group_column) %>%
summarise(
mean = mean(!!summary_column),
sum = sum(!!summary_column)
# Other metrics that need to be generated frequently
) %>%
ungroup -> smryDta
if (useColname)
smryDta <- smryDta %>%
rename_at(
vars(-one_of(quo_name(group_column))),
~paste(quo_name(group_column), .x, sep="_")
)
return(smryDta)
}
mtcars %>% generate_summary_tbl(am, mpg)
# # A tibble: 2 x 3
# am mean sum
# <dbl> <dbl> <dbl>
# 1 0 17.14737 325.8
# 2 1 24.39231 317.1
mtcars %>% generate_summary_tbl(am, mpg, T)
# # A tibble: 2 x 3
# am am_mean am_sum
# <dbl> <dbl> <dbl>
# 1 0 17.14737 325.8
# 2 1 24.39231 317.1
这篇关于在自定义dplyr函数中更改结果变量的名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!