如何在dplyr中定义函数? [英] How to define a function in dplyr?
问题描述
我在R的 dplyr
包中创建了一个简单的数据透视表。这是我的工作示例:
I created a simple pivot table in the dplyr
package in R. Here is my working example:
library(dplyr)
mean_mpg <- mean(mtcars$mpg)
# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not
mtcars <-
mtcars %>%
mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))
mtcars %>%
group_by(as.factor(cyl)) %>%
summarise(sum=sum(mpg_cat),total=n()) %>%
mutate(percentage=sum*100/total)
现在,我想写一个函数以重用此代码:
Now, I want to write a function to reuse this code:
get_pivot <- function(data, predictor,target) {
result <-
data %>%
group_by(as.factor(predictor)) %>%
summarise(sum=sum(target),total=n()) %>%
mutate(percentage=sum*100/total);
print(result)
}
但是我收到了出现以下错误:
but I receive the following error:
is.factor(x)中的错误:找不到对象'cyl'
Error in is.factor(x) : object 'cyl' not found
我也尝试过
get_pivot(mtcars, "cyl", "mpg_cat" )
但它不起作用。
我该怎么办?
推荐答案
如果您拥有最新的 rlang
库更新v0.4.0(2019年6月)中,可以使用双大括号 {{}}
(又称卷曲)来简化dplyr的编程。 / p>
If you have the most recent rlang
library update v0.4.0 (June 2019), you can use double curly brackets {{ }}
(aka "curly curly") to make programming with dplyr easier.
# Note: needs installation of rlang 0.4.0 or later
get_pivot <- function(data, predictor,target) {
result <-
data %>%
group_by(as.factor( {{ predictor }} )) %>%
summarise(sum=sum( {{ target }} ),total=n()) %>%
mutate(percentage=sum*100/total);
print(result)
}
# Edit -- thank you Rui Barradas
> get_pivot(mtcars, cyl, mpg_cat)
# A tibble: 3 x 4
`as.factor(cyl)` sum total percentage
<fct> <dbl> <int> <dbl>
1 4 11 11 100
2 6 3 7 42.9
3 8 0 14 0
之所以需要这样做,是因为 dplyr
和其他 tidyverse
软件包使用非标准评估就像您遇到一些基本的R函数一样,例如 lm(mpg〜factor(am),data = mtcars)
。这种做法通常会使交互式代码更短,更简单且更易于阅读,但以使编程更复杂为代价。在这种情况下, {{}}
运算符用于将您指定的列传输到函数的上下文中。
The reason this is required is that dplyr
and other tidyverse
packages use "non-standard evaluation" like you encounter with some base R functions, like lm(mpg~factor(am),data=mtcars)
. This practice often makes "interactive" code shorter, simpler, and easier to read, but at the cost of making programming more complicated. In this case, the {{ }}
operator serves to transport the column you specify into the context of the function.
https://www.tidyverse.org/articles / 2019/06 / rlang-0-4-0 /
这篇关于如何在dplyr中定义函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!