用dplyr进行函数式编程 [英] Functional programming with dplyr

查看:187
本文介绍了用dplyr进行函数式编程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找一种更高效/优雅的方式将多个参数传递给一个组 - 通过在使用dplyr的函数中使用非标准评估。我不想使用...运算符,而是单独指定函数。

我的具体用例是一个函数,它需要一个数据框并创建一个ggplot对象,语法更简单。这里是我想用我的函数实现自动化的代码示例:

 #create data frame 
my_df< - data.frame(month = sample(1:12,1000,replace = T),
category = sample(head(letters,3),1000,replace = T),
approved = as。 (runif(1000)<0.5))

my_df $ converted < %>%
group_by(month,category)%>%
summary(conversion_rate = sum(converted)/ sum(approved))%>%
ggplot + geom_line(aes x = month,y = conversion_rate,group = category,
color = category))

我想将group_by,summarize,ggplot和geom_line合并为一个简单的函数,我可以提供一个x,y和group,并让它执行所有肮脏的工作。这是我得到的工作:

 #创建进行分组和绘图的函数
plot_lines< - 函数(df,x,y,group){

x < - enquo(x)
group < - enquo(group)
group_bys < - quos(! !!x,!! group)

df%>%
group_by(!!! group_bys)%>%
my_smry%>%
ggplot + geom_line(aes_(x = substitute(x),y​​ = substitute(y),
group = substitute(group),color = substitute(group)))
}

#创建一个函数来进行汇总
my_smry< - 函数(x){
x%>%
汇总(conversion_rate = sum(转换)/ sum(批准))
}

#使用我的函数
my_df%>%
plot_lines(x = month,y = conversion_rate,group = category)

我觉得group_by处理很不雅观:引用 x group with enquo ,然后用 !!取消引用它们。在另一个引用函数 quos 内,只能用 !!! 重新取消引用。下一行,但这是我能够开始工作的唯一一件事。有没有更好的方式来做到这一点?



另外,有没有办法让ggplot采取 !! 而不是替代

解决方案

问题是ggplot尚未更新以处理数据包,所以您我们必须传递它的表达式,你可以用 rlang :: quo_expr 来创建表达式:

  library(tidyverse)
set.seed(47)

my_df< - data_frame( $ = b $ b category = sample(head(letters,3),1000,replace = TRUE),
approved = as.numeric(runif(1000 )<0.5),
转换=认可* as.numeric(runif(1000)<0.5))

plot_lines< - function(df,x,y,group){
x< - enquo(x)
y< - enquo(y)
group< - enquo(group)

df%>%
group_by(!! x,!! group)%>%
summary(conversion_rate = sum(转换)/ sum(approved))%>%
ggplot(aes_(x = rlang :: quo_expr(x),
y = rlang :: quo_expr(y),
color = rlang :: quo_expr(group)))+
geom_line()
}

my_df%>%plot_lines(month,conversion_rate,category)



然而,请记住ggplot几乎不可避免地会从lazyeval更新为rlang,所以虽然这个界面可能会继续工作,但更简单,更一致的界面很可能很快就可能实现。


Looking for a more efficient / elegant way to pass multiple arguments to a group-by using non-standard evaluation in a function using dplyr. I don't want to use the ... operator, but to specify the functions individually.

My specific use case is a function which takes a data frame and creates a ggplot object with simpler syntax. Here is an example of the code I want to automate with my function:

# create data frame
my_df <- data.frame(month = sample(1:12, 1000, replace = T),
                    category = sample(head(letters, 3), 1000, replace = T),
                    approved = as.numeric(runif(1000) < 0.5))

my_df$converted <- my_df$approved * as.numeric(runif(1000) < 0.5)

my_df %>%
  group_by(month, category) %>%
  summarize(conversion_rate = sum(converted) / sum(approved)) %>%
  ggplot + geom_line(aes(x = month, y = conversion_rate, group = category, 
  color = category))

I want to combine that group_by, summarize, ggplot, and geom_line into a simple function that I can feed an x, y, and group, and have it perform all the dirty work under the hood. Here's what I've gotten to work:

# create the function that does the grouping and plotting
plot_lines <- function(df, x, y, group) {

  x <- enquo(x)
  group <- enquo(group)
  group_bys <- quos(!! x, !! group)

  df %>%
    group_by(!!! group_bys) %>%
    my_smry %>%
    ggplot + geom_line(aes_(x = substitute(x), y = substitute(y), 
    group = substitute(group), color = substitute(group)))
}

# create a function to do the summarization
my_smry <- function(x) {
  x %>% 
    summarize(conversion_rate = sum(converted) / sum(approved))
}

# use my function
my_df %>% 
  plot_lines(x = month, y = conversion_rate, group = category)

I feel like the group_by handling is pretty inelegant: quoting x and group with enquo, then unquoting them with !! inside of another quoting function quos, only to re-unquote them with !!! on the next line, but it's the only thing I've been able to get to work. Is there a better way to do this?

Also, is there a way to get ggplot to take !! instead of substitute? What I'm doing feels inconsistent.

解决方案

The problem is that ggplot hasn't been updated to handle quosures yet, so you've got to pass it expressions, which you can create from quosures with rlang::quo_expr:

library(tidyverse)
set.seed(47)

my_df <- data_frame(month = sample(1:12, 1000, replace = TRUE),
                    category = sample(head(letters, 3), 1000, replace = TRUE),
                    approved = as.numeric(runif(1000) < 0.5),
                    converted = approved * as.numeric(runif(1000) < 0.5))

plot_lines <- function(df, x, y, group) {
    x <- enquo(x)
    y <- enquo(y)
    group <- enquo(group)

    df %>%
        group_by(!! x, !! group) %>%
        summarise(conversion_rate = sum(converted) / sum(approved)) %>%
        ggplot(aes_(x = rlang::quo_expr(x), 
                    y = rlang::quo_expr(y), 
                    color = rlang::quo_expr(group))) + 
        geom_line()
}

my_df %>% plot_lines(month, conversion_rate, category)

However, keep in mind that ggplot will almost inevitably be updated from lazyeval to rlang, so while this interface will probably keep working, a simpler, more consistent one will probably be possible shortly.

这篇关于用dplyr进行函数式编程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆