dplyr和非标准评估(NSE) [英] dplyr and Non-standard evaluation (NSE)
问题描述
我正在尝试编写一个函数,该函数使用dplyr接受数据框的名称和要汇总的列,然后返回汇总的数据框。我已经尝试了lazyeval包中的许多interp()排列,但是我花了太多时间尝试使其工作。因此,我在此处编写了该函数的静态版本:
summarize.df.static<-function( ){
temp_df<-mtcars%&%;%
group_by(cyl)%&%;%
summary(qsec = mean(qsec),
mpg = mean(mpg) )
return(temp_df)
}
new_df<-summary.df.static()
head(new_df)
这是我停留的动态版本的开始:
summarize.df.dynamic<-function(df_in,sum_metric_in){
temp_df<-df_in%>%
group_by(cyl)%>%
summary_(qsec =平均值(qsec),
sum_metric_in =平均值(sum_metric_in))#interp()的某些组合
return(temp_df)
}
new_df<-summary.df.dynamic(mtcars, mpg)
头(new_df)
请注意,我希望本示例中的列名也来自传入的参数(在本例中为mpg)。另请注意,qsec列是静态的,即未传入。
以下是 docendo discimus发布的正确答案:
summarize.df.dynamic<-函数(df_in,sum_metric_in){
temp_df<-df_in%>%
group_by(cyl)%&%;%
summary_(qsec =〜mean (qsec),
xyz = interp(〜mean(var),var = as.name(sum_metric_in)))
names(temp_df)[names(temp_df)== xyz ]<-sum_metric_in
return(temp_df)
}
new_df<-summary.df.dynamic(mtcars, mpg)
head(new_df)
#cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
new_df< -summary.df.dynamic(mtcars, disp)
head(new_df)
#cyl qsec disp
#1 4 19.13727 105.1364
#2 6 17.97714 183.3143
#3 8 16.77214 353.1000
具体示例(使用静态 qsec等),您可以这样做:
库(dplyr)
库(延迟)
summary.df<-函数(data,sum_metric_in){
data<-data%&%;%
group_by(cyl)%>%
summary_(qsec =〜mean(qsec),
xyz = interp(〜mean(var),var = as.name(sum_metric_in)))
names(data)[names(data)== xyz]<-sum_metric_in
data
}
summary.df(mtcars, mpg)
#来源:本地数据帧[3 x 3]
#
#cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
AFAIK您不能(还?)提供输入 sum_metric_in 到dplyr :: rename,您通常会用它来重命名该列,这就是为什么我在示例中做了不同的事情。
I'm trying to write a function that takes in the name of a data frame and a column to summarize by using dplyr, then returns the summarized data frame. I've tried a bunch of permutations of interp() from the lazyeval package, but I've spent way too much time trying to get it to work. So, I wrote a "static" version of the function I want here:
summarize.df.static <- function(){
temp_df <- mtcars %>%
group_by(cyl) %>%
summarize(qsec = mean(qsec),
mpg=mean(mpg))
return(temp_df)
}
new_df <- summarize.df.static()
head(new_df)
Here is the start of the dynamic version I'm stuck on:
summarize.df.dynamic <- function(df_in,sum_metric_in){
temp_df <- df_in %>%
group_by(cyl) %>%
summarize_(qsec = mean(qsec),
sum_metric_in=mean(sum_metric_in)) # some mix of interp()
return(temp_df)
}
new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)
Note that I want the column name in this example to come from the parameter passed-in as well (mpg in this case). Also note that the qsec column is static, ie not passed-in.
Below is the correct answer posted by "docendo discimus":
summarize.df.dynamic<- function(df_in, sum_metric_in){
temp_df <- df_in %>%
group_by(cyl) %>%
summarize_(qsec = ~mean(qsec),
xyz = interp(~mean(var), var = as.name(sum_metric_in)))
names(temp_df)[names(temp_df) == "xyz"] <- sum_metric_in
return(temp_df)
}
new_df <- summarize.df.dynamic(mtcars,"mpg")
head(new_df)
# cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
new_df <- summarize.df.dynamic(mtcars,"disp")
head(new_df)
# cyl qsec disp
#1 4 19.13727 105.1364
#2 6 17.97714 183.3143
#3 8 16.77214 353.1000
For the specific example (with static "qsec" etc) you could do:
library(dplyr)
library(lazyeval)
summarize.df <- function(data, sum_metric_in){
data <- data %>%
group_by(cyl) %>%
summarize_(qsec = ~mean(qsec),
xyz = interp(~mean(var), var = as.name(sum_metric_in)))
names(data)[names(data) == "xyz"] <- sum_metric_in
data
}
summarize.df(mtcars, "mpg")
#Source: local data frame [3 x 3]
#
# cyl qsec mpg
#1 4 19.13727 26.66364
#2 6 17.97714 19.74286
#3 8 16.77214 15.10000
AFAIK you cannot (yet?) supply the input "sum_metric_in" to dplyr::rename which you would typically use to rename the column, which is why I did it different in the example.
这篇关于dplyr和非标准评估(NSE)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!