传递参数给dplyr函数 [英] Pass arguments to dplyr functions
问题描述
我想使用 dplyr
参数化以下计算,它可以找出哪些值 Sepal.Length
与更多关联超过一个值 Sepal.Width
:
I want to parameterise the following computation using dplyr
that finds which values of Sepal.Length
are associated with more than one value of Sepal.Width
:
library(dplyr)
iris %>%
group_by(Sepal.Length) %>%
summarise(n.uniq=n_distinct(Sepal.Width)) %>%
filter(n.uniq > 1)
通常我会写这样的东西:
Normally I would write something like this:
not.uniq.per.group <- function(data, group.var, uniq.var) {
iris %>%
group_by(group.var) %>%
summarise(n.uniq=n_distinct(uniq.var)) %>%
filter(n.uniq > 1)
}
但是,这种方法抛出错误,因为 dplyr
使用非标准评估。这个函数应该如何写?
However, this approach throws errors because dplyr
uses non-standard evaluation. How should this function be written?
推荐答案
你需要使用标准的评估版本的 dplyr
函数(只是附加'_'到函数名,即。 group_by _
& 总结_
)并将字符串传递给你的函数,然后你需要变成符号。要参数summaryise的参数,您需要使用 lazyeval
包中定义的 interp()
。具体地:
You need to use the standard evaluation versions of the dplyr
functions (just append '_' to the function names, ie. group_by_
& summarise_
) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp()
, which is defined in the lazyeval
package. Concretely:
library(dplyr)
library(lazyeval)
not.uniq.per.group <- function(df, grp.var, uniq.var) {
df %>%
group_by_(grp.var) %>%
summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
filter(n_uniq > 1)
}
not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
dplyr
小插曲为非更多细节的标准评估。
See the dplyr
vignette for non standard evaluation for more details.
这篇关于传递参数给dplyr函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!