图书馆功能使用非标准评估时的Refactor R代码 [英] Refactor R code when library functions use non-standard evaluation
问题描述
我有一些R代码,如下所示:
I have some R code that looks like this:
library(dplyr)
library(datasets)
iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
给予:
Source: local data frame [6 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.3 3.0 1.1 0.1 setosa
2 4.6 3.6 1.0 0.2 setosa
3 5.0 2.3 3.3 1.0 versicolor
4 5.1 2.5 3.0 1.1 versicolor
5 4.9 2.5 4.5 1.7 virginica
6 6.0 3.0 4.8 1.8 virginica
根据物种分组,每组只保留最短的Petal.Length。我的代码中有一些重复,因为我对于不同的列和数字做了这么多次。例如:
This groups by species, and for each group keeps only the two with the shortest Petal.Length. I have some duplication in my code, because I do this several times for different columns and numbers. E.g.:
iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(Petal.Width, ties.method = 'random')<=3) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Width, ties.method = 'random')<=3) %.% ungroup()
我想将其解压缩到一个函数中。天真的方法不起作用:
I want to extract this into a function. The naive approach doesn't work:
keep_min_n_by_species <- function(expr, n) {
iris %.% group_by(Species) %.% filter(rank(expr, ties.method = 'random') <= n) %.% ungroup()
}
keep_min_n_by_species(Petal.Width, 2)
Error in filter_impl(.data, dots(...), environment()) :
object 'Petal.Width' not found
据了解,表达式排名(Petal.Length,ties.method ='random')< ; = 2
在不同的上下文中进行评估,由过滤器
函数引入,为 Petal提供了一个意义。长度
表达式。我不能只是在一个变量中交换Petal.Length,因为它将在错误的上下文中进行评估。我已经尝试使用替代
和 eval
的不同组合,看过这个页面:非标准评估。我找不到合适的组合。我认为问题可能是,我不仅希望通过一个来自调用者的表达式( Petal.Length
)到 filter
为了评估 - 我想构造一个新的更大的表达式( rank(Petal.Length,ties.method ='random')< = 2
),然后将整个表达式传递给过滤器
以进行评估。
As I understand it, the expression rank(Petal.Length, ties.method = 'random') <= 2
is evaluated in a different context, introduced by the filter
function, that provides a meaning for the Petal.Length
expression. I can't just swap in a variable for Petal.Length, because it will be evaluated in the wrong context. I've tried using different combinations of substitute
and eval
, having read this page: Non-standard evaluation. I can't figure out an appropriate combination. I think the problem might be that I don't just want to pass through an expression from the caller (Petal.Length
) through to filter
for it to evaluate - I want to construct a new bigger expression (rank(Petal.Length, ties.method = 'random') <= 2
) and then pass that whole expression through to filter
for it to evaluate.
- 如何将这个表达式重构为一个函数?
- 更一般来说,我应该如何将R表达式提取到函数中?
- 更普遍的是,我以错误的心态接近这个?在更熟悉的主流语言(例如Python,C ++,C#)中,这是一个比较简单的操作,我希望在所有的时间内删除代码中的重复。在R中,似乎(至少对我来说)非标准评估可以使其成为非常明显的操作。我应该完全做些什么吗?
- How can I refactor this expression into a function?
- More generally, how should I go about extracting an R expression into a function?
- Even more generally, am I approaching this with the wrong mindset? In more mainstream languages I'm familiar with (e.g. Python, C++, C#), this is a relatively straightforward operation that I want to do all the time to remove duplication in my code. In R it seems (to me, at least) that non-standard evaluation can make it a very non-obvious operation. Should I be doing something else entirely?
推荐答案
dplyr
版本0.3开始使用 lazyeval
软件包解决这个问题,如@baptiste所提及的,并且使用了一系列新的功能,使用标准评估(相同的函数名称NSE版本,但以 _
结尾)。这里有一个小插曲: https://github.com/hadley/dplyr /blob/master/vignettes/nse.Rmd
dplyr
version 0.3 is beginning to address this using the lazyeval
package, as @baptiste mentioned, and a new family of functions that use standard evaluation (same function names as the NSE versions, but ending in _
). There is a vignette here: https://github.com/hadley/dplyr/blob/master/vignettes/nse.Rmd
所有这一切,我不知道你想做什么的最佳做法(虽然我试图做同样的事情)。我有一些工作,但像我说的,我不知道这是否是最好的方法。请注意使用 filter _()
而不是 filter()
,并将参数作为引用的字符串传递:
All that being said, I don't know best practices for what you're trying to do (though I'm trying to do the same thing). I have something working, but like I said, I don't know if it's the best way to do it. Note the use of filter_()
instead of filter()
, and passing in the argument as a quoted character string:
devtools::install_github("hadley/dplyr")
devtools::install_github("hadley/lazyeval")
library(dplyr)
library(lazyeval)
keep_min_n_by_species <- function(expr, n, rev = FALSE) {
iris %>%
group_by(Species) %>%
filter_(interp(~rank(if (rev) -x else x, ties.method = 'random') <= y, # filter_, not filter
x = as.name(expr), y = n)) %>%
ungroup()
}
keep_min_n_by_species("Petal.Width", 3) # "Petal.Width" as character string
keep_min_n_by_species("Petal.Width", 3, rev = TRUE)
基于@ hadley的评论更新
keep_min_n_by_species <- function(expr, n) {
expr <- lazy(expr)
formula <- interp(~rank(x, ties.method = 'random') <= y,
x = expr, y = n)
iris %>%
group_by(Species) %>%
filter_(formula) %>%
ungroup()
}
keep_min_n_by_species(Petal.Width, 3)
keep_min_n_by_species(-Petal.Width, 3)
这篇关于图书馆功能使用非标准评估时的Refactor R代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!