图书馆功能使用非标准评估时的Refactor R代码 [英] Refactor R code when library functions use non-standard evaluation

查看：126 发布时间：2017/7/13 21:04:15 r dplyr

本文介绍了图书馆功能使用非标准评估时的Refactor R代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些R代码，如下所示：

I have some R code that looks like this:

library(dplyr)
library(datasets)

iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()

给予：

Source: local data frame [6 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          4.3         3.0          1.1         0.1     setosa
2          4.6         3.6          1.0         0.2     setosa
3          5.0         2.3          3.3         1.0 versicolor
4          5.1         2.5          3.0         1.1 versicolor
5          4.9         2.5          4.5         1.7  virginica
6          6.0         3.0          4.8         1.8  virginica

根据物种分组，每组只保留最短的Petal.Length。我的代码中有一些重复，因为我对于不同的列和数字做了这么多次。例如：

This groups by species, and for each group keeps only the two with the shortest Petal.Length. I have some duplication in my code, because I do this several times for different columns and numbers. E.g.:

iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(Petal.Width, ties.method = 'random')<=3) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Width, ties.method = 'random')<=3) %.% ungroup()

我想将其解压缩到一个函数中。天真的方法不起作用：

I want to extract this into a function. The naive approach doesn't work:

keep_min_n_by_species <- function(expr, n) {
  iris %.% group_by(Species) %.% filter(rank(expr, ties.method = 'random') <= n) %.% ungroup()
}

keep_min_n_by_species(Petal.Width, 2)

Error in filter_impl(.data, dots(...), environment()) : 
  object 'Petal.Width' not found

据了解，表达式排名（Petal.Length，ties.method ='random'）< ; = 2 在不同的上下文中进行评估，由过滤器函数引入，为 Petal提供了一个意义。长度表达式。我不能只是在一个变量中交换Petal.Length，因为它将在错误的上下文中进行评估。我已经尝试使用替代和 eval 的不同组合，看过这个页面：非标准评估。我找不到合适的组合。我认为问题可能是，我不仅希望通过一个来自调用者的表达式（ Petal.Length ）到 filter 为了评估 - 我想构造一个新的更大的表达式（ rank（Petal.Length，ties.method ='random'）< = 2 ），然后将整个表达式传递给过滤器以进行评估。

As I understand it, the expression rank(Petal.Length, ties.method = 'random') <= 2 is evaluated in a different context, introduced by the filter function, that provides a meaning for the Petal.Length expression. I can't just swap in a variable for Petal.Length, because it will be evaluated in the wrong context. I've tried using different combinations of substitute and eval, having read this page: Non-standard evaluation. I can't figure out an appropriate combination. I think the problem might be that I don't just want to pass through an expression from the caller (Petal.Length) through to filter for it to evaluate - I want to construct a new bigger expression (rank(Petal.Length, ties.method = 'random') <= 2) and then pass that whole expression through to filter for it to evaluate.

如何将这个表达式重构为一个函数？

更一般来说，我应该如何将R表达式提取到函数中？

更普遍的是，我以错误的心态接近这个？在更熟悉的主流语言（例如Python，C ++，C＃）中，这是一个比较简单的操作，我希望在所有的时间内删除代码中的重复。在R中，似乎（至少对我来说）非标准评估可以使其成为非常明显的操作。我应该完全做些什么吗？

How can I refactor this expression into a function?
More generally, how should I go about extracting an R expression into a function?
Even more generally, am I approaching this with the wrong mindset? In more mainstream languages I'm familiar with (e.g. Python, C++, C#), this is a relatively straightforward operation that I want to do all the time to remove duplication in my code. In R it seems (to me, at least) that non-standard evaluation can make it a very non-obvious operation. Should I be doing something else entirely?

推荐答案

dplyr 版本0.3开始使用 lazyeval 软件包解决这个问题，如@baptiste所提及的，并且使用了一系列新的功能，使用标准评估（相同的函数名称NSE版本，但以 _ 结尾）。这里有一个小插曲： https://github.com/hadley/dplyr /blob/master/vignettes/nse.Rmd

dplyr version 0.3 is beginning to address this using the lazyeval package, as @baptiste mentioned, and a new family of functions that use standard evaluation (same function names as the NSE versions, but ending in _). There is a vignette here: https://github.com/hadley/dplyr/blob/master/vignettes/nse.Rmd

所有这一切，我不知道你想做什么的最佳做法（虽然我试图做同样的事情）。我有一些工作，但像我说的，我不知道这是否是最好的方法。请注意使用 filter _（）而不是 filter（），并将参数作为引用的字符串传递：

All that being said, I don't know best practices for what you're trying to do (though I'm trying to do the same thing). I have something working, but like I said, I don't know if it's the best way to do it. Note the use of filter_() instead of filter(), and passing in the argument as a quoted character string:

devtools::install_github("hadley/dplyr")
devtools::install_github("hadley/lazyeval")

library(dplyr)
library(lazyeval)

keep_min_n_by_species <- function(expr, n, rev = FALSE) {
  iris %>% 
    group_by(Species) %>% 
    filter_(interp(~rank(if (rev) -x else x, ties.method = 'random') <= y, # filter_, not filter
                   x = as.name(expr), y = n)) %>% 
    ungroup()
}

keep_min_n_by_species("Petal.Width", 3) # "Petal.Width" as character string
keep_min_n_by_species("Petal.Width", 3, rev = TRUE)

基于@ hadley的评论更新

keep_min_n_by_species <- function(expr, n) {
  expr <- lazy(expr)

  formula <- interp(~rank(x, ties.method = 'random') <= y,
                    x = expr, y = n)

  iris %>% 
    group_by(Species) %>% 
    filter_(formula) %>% 
    ungroup()
}

keep_min_n_by_species(Petal.Width, 3)
keep_min_n_by_species(-Petal.Width, 3)

这篇关于图书馆功能使用非标准评估时的Refactor R代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

图书馆功能使用非标准评估时的Refactor R代码 [英] Refactor R code when library functions use non-standard evaluation

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

图书馆功能使用非标准评估时的Refactor R代码 [英] Refactor R code when library functions use non-standard evaluation

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭