如何在 dplyr 中定义一个函数? [英] How to define a function in dplyr?

查看:19
本文介绍了如何在 dplyr 中定义一个函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 的 dplyr 包中创建了一个简单的数据透视表.这是我的工作示例:

I created a simple pivot table in the dplyr package in R. Here is my working example:

library(dplyr)
mean_mpg <- mean(mtcars$mpg)

# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not

mtcars <-
mtcars %>%
  mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))

mtcars %>%
  group_by(as.factor(cyl)) %>%
  summarise(sum=sum(mpg_cat),total=n()) %>%
  mutate(percentage=sum*100/total)

现在,我想写一个函数来重用这段代码:

Now, I want to write a function to reuse this code:

get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor(predictor)) %>%
    summarise(sum=sum(target),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

但我收到以下错误:

is.factor(x) 中的错误:找不到对象cyl"

Error in is.factor(x) : object 'cyl' not found

我也试过

get_pivot(mtcars, "cyl", "mpg_cat" )

但是没有用.

我该怎么办?

推荐答案

如果您有最新的 rlang 库更新 v0.4.0(2019 年 6 月),您可以使用双大括号 {{ }}(又名curly curly")使使用 dplyr 编程更容易.

If you have the most recent rlang library update v0.4.0 (June 2019), you can use double curly brackets {{ }} (aka "curly curly") to make programming with dplyr easier.

# Note: needs installation of rlang 0.4.0 or later
get_pivot <- function(data, predictor,target) {
  result <-
    data %>%
    group_by(as.factor( {{ predictor }} )) %>%
    summarise(sum=sum( {{ target }} ),total=n()) %>%
    mutate(percentage=sum*100/total);

  print(result)
}

# Edit -- thank you Rui Barradas
> get_pivot(mtcars, cyl, mpg_cat)
# A tibble: 3 x 4
  `as.factor(cyl)`   sum total percentage
  <fct>            <dbl> <int>      <dbl>
1 4                   11    11      100  
2 6                    3     7       42.9
3 8                    0    14        0  

需要这样做的原因是 dplyr 和其他 tidyverse 包使用非标准评估",就像你遇到一些基本的 R 函数一样,比如 lm(mpg~factor(am),data=mtcars).这种做法通常使交互式"代码更短、更简单、更易于阅读,但代价是使编程更加复杂.在这种情况下,{{}} 运算符用于将您指定的列传输到函数的上下文中.

The reason this is required is that dplyr and other tidyverse packages use "non-standard evaluation" like you encounter with some base R functions, like lm(mpg~factor(am),data=mtcars). This practice often makes "interactive" code shorter, simpler, and easier to read, but at the cost of making programming more complicated. In this case, the {{ }} operator serves to transport the column you specify into the context of the function.

https://www.tidyverse.org/文章/2019/06/rlang-0-4-0/

这篇关于如何在 dplyr 中定义一个函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆