dplyr:如何在函数内部使用 group_by? [英] dplyr: How to use group_by inside a function?
问题描述
我想在另一个函数中使用 dplyr::group_by
函数,但我不知道如何将参数传递给这个函数.
I want to use use the dplyr::group_by
function inside another function, but I do not know how to pass the arguments to this function.
有人可以提供一个有效的例子吗?
Can someone provide a working example?
library(dplyr)
data(iris)
iris %.% group_by(Species) %.% summarise(n = n()) #
## Source: local data frame [3 x 2]
## Species n
## 1 virginica 50
## 2 versicolor 50
## 3 setosa 50
mytable0 <- function(x, ...) x %.% group_by(...) %.% summarise(n = n())
mytable0(iris, "Species") # OK
## Source: local data frame [3 x 2]
## Species n
## 1 virginica 50
## 2 versicolor 50
## 3 setosa 50
mytable1 <- function(x, key) x %.% group_by(as.name(key)) %.% summarise(n = n())
mytable1(iris, "Species") # Wrong!
# Error: unsupported type for column 'as.name(key)' (SYMSXP)
mytable2 <- function(x, key) x %.% group_by(key) %.% summarise(n = n())
mytable2(iris, "Species") # Wrong!
# Error: index out of bounds
推荐答案
对于编程,group_by_
是 group_by
的对应物:
For programming, group_by_
is the counterpart to group_by
:
library(dplyr)
mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")
给出:
Species n
1 setosa 50
2 versicolor 50
3 virginica 50
更新 在写这篇文章时,dplyr 使用了 %.%
,这是上面最初使用的,但现在 %>%
是受青睐,因此已在上面更改为保持相关性.
Update At the time this was written dplyr used %.%
which is what was originally used above but now %>%
is favored so have changed above to that to keep this relevant.
更新 2 重组现已弃用,请改用 group_by_.
Update 2 regroup is now deprecated, use group_by_ instead.
更新 3 group_by_(list(...))
在新版本的 dplyr 中现在变成了 group_by_(...)
罗伯托的评论.
Update 3 group_by_(list(...))
now becomes group_by_(...)
in new version of dplyr as per Roberto's comment.
更新 4 添加了评论中建议的细微变化.
Update 4 Added minor variation suggested in comments.
更新 5:现在可以使用 rlang/tidyeval 执行此操作:
Update 5: With rlang/tidyeval it is now possible to do this:
library(rlang)
mytable <- function(x, ...) {
group_ <- syms(...)
x %>%
group_by(!!!group_) %>%
summarise(n = n())
}
mytable(iris, "Species")
或传递未评估的 Species
,即周围没有引号:
or passing Species
unevaluated, i.e. no quotes around it:
library(rlang)
mytable <- function(x, ...) {
group_ <- enquos(...)
x %>%
group_by(!!!group_) %>%
summarise(n = n())
}
mytable(iris, Species)
更新 6:现在有一个 {{...}} 表示法,如果只有一个分组变量:
Update 6: There is now a {{...}} notation that works if there is just one grouping variable:
mytable <- function(x, group) {
x %>%
group_by({{group}}) %>%
summarise(n = n())
}
mytable(iris, Species)
这篇关于dplyr:如何在函数内部使用 group_by?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!