图书馆功能使用非标准评估时的Refactor R代码 [英] Refactor R code when library functions use non-standard evaluation

查看:126
本文介绍了图书馆功能使用非标准评估时的Refactor R代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些R代码,如下所示:

I have some R code that looks like this:

library(dplyr)
library(datasets)

iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()

给予:

Source: local data frame [6 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1          4.3         3.0          1.1         0.1     setosa
2          4.6         3.6          1.0         0.2     setosa
3          5.0         2.3          3.3         1.0 versicolor
4          5.1         2.5          3.0         1.1 versicolor
5          4.9         2.5          4.5         1.7  virginica
6          6.0         3.0          4.8         1.8  virginica

根据物种分组,每组只保留最短的Petal.Length。我的代码中有一些重复,因为我对于不同的列和数字做了这么多次。例如:

This groups by species, and for each group keeps only the two with the shortest Petal.Length. I have some duplication in my code, because I do this several times for different columns and numbers. E.g.:

iris %.% group_by(Species) %.% filter(rank(Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Length, ties.method = 'random')<=2) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(Petal.Width, ties.method = 'random')<=3) %.% ungroup()
iris %.% group_by(Species) %.% filter(rank(-Petal.Width, ties.method = 'random')<=3) %.% ungroup()

我想将其解压缩到一个函数中。天真的方法不起作用:

I want to extract this into a function. The naive approach doesn't work:

keep_min_n_by_species <- function(expr, n) {
  iris %.% group_by(Species) %.% filter(rank(expr, ties.method = 'random') <= n) %.% ungroup()
}

keep_min_n_by_species(Petal.Width, 2)

Error in filter_impl(.data, dots(...), environment()) : 
  object 'Petal.Width' not found 

据了解,表达式排名(Petal.Length,ties.method ='random')< ; = 2 在不同的上下文中进行评估,由过滤器函数引入,为 Petal提供了一个意义。长度表达式。我不能只是在一个变量中交换Petal.Length,因为它将在错误的上下文中进行评估。我已经尝试使用替代 eval 的不同组合,看过这个页面:非标准评估。我找不到合适的组合。我认为问题可能是,我不仅希望通过一个来自调用者的表达式( Petal.Length )到 filter 为了评估 - 我想构造一个新的更大的表达式( rank(Petal.Length,ties.method ='random')< = 2 ),然后将整个表达式传递给过滤器以进行评估。

As I understand it, the expression rank(Petal.Length, ties.method = 'random') <= 2 is evaluated in a different context, introduced by the filter function, that provides a meaning for the Petal.Length expression. I can't just swap in a variable for Petal.Length, because it will be evaluated in the wrong context. I've tried using different combinations of substitute and eval, having read this page: Non-standard evaluation. I can't figure out an appropriate combination. I think the problem might be that I don't just want to pass through an expression from the caller (Petal.Length) through to filter for it to evaluate - I want to construct a new bigger expression (rank(Petal.Length, ties.method = 'random') <= 2) and then pass that whole expression through to filter for it to evaluate.


  1. 如何将这个表达式重构为一个函数?

  2. 更一般来说,我应该如何将R表达式提取到函数中?

  3. 更普遍的是,我以错误的心态接近这个?在更熟悉的主流语言(例如Python,C ++,C#)中,这是一个比较简单的操作,我希望在所有的时间内删除代码中的重复。在R中,似乎(至少对我来说)非标准评估可以使其成为非常明显的操作。我应该完全做些什么吗?

  1. How can I refactor this expression into a function?
  2. More generally, how should I go about extracting an R expression into a function?
  3. Even more generally, am I approaching this with the wrong mindset? In more mainstream languages I'm familiar with (e.g. Python, C++, C#), this is a relatively straightforward operation that I want to do all the time to remove duplication in my code. In R it seems (to me, at least) that non-standard evaluation can make it a very non-obvious operation. Should I be doing something else entirely?


推荐答案

dplyr 版本0.3开始使用 lazyeval 软件包解决这个问题,如@baptiste所提及的,并且使用了一系列新的功能,使用标准评估(相同的函数名称NSE版本,但以 _ 结尾)。这里有一个小插曲: https://github.com/hadley/dplyr /blob/master/vignettes/nse.Rmd

dplyr version 0.3 is beginning to address this using the lazyeval package, as @baptiste mentioned, and a new family of functions that use standard evaluation (same function names as the NSE versions, but ending in _). There is a vignette here: https://github.com/hadley/dplyr/blob/master/vignettes/nse.Rmd

所有这一切,我不知道你想做什么的最佳做法(虽然我试图做同样的事情)。我有一些工作,但像我说的,我不知道这是否是最好的方法。请注意使用 filter _()而不是 filter(),并将参数作为引用的字符串传递:

All that being said, I don't know best practices for what you're trying to do (though I'm trying to do the same thing). I have something working, but like I said, I don't know if it's the best way to do it. Note the use of filter_() instead of filter(), and passing in the argument as a quoted character string:

devtools::install_github("hadley/dplyr")
devtools::install_github("hadley/lazyeval")

library(dplyr)
library(lazyeval)

keep_min_n_by_species <- function(expr, n, rev = FALSE) {
  iris %>% 
    group_by(Species) %>% 
    filter_(interp(~rank(if (rev) -x else x, ties.method = 'random') <= y, # filter_, not filter
                   x = as.name(expr), y = n)) %>% 
    ungroup()
}

keep_min_n_by_species("Petal.Width", 3) # "Petal.Width" as character string
keep_min_n_by_species("Petal.Width", 3, rev = TRUE)

基于@ hadley的评论更新

keep_min_n_by_species <- function(expr, n) {
  expr <- lazy(expr)

  formula <- interp(~rank(x, ties.method = 'random') <= y,
                    x = expr, y = n)

  iris %>% 
    group_by(Species) %>% 
    filter_(formula) %>% 
    ungroup()
}

keep_min_n_by_species(Petal.Width, 3)
keep_min_n_by_species(-Petal.Width, 3)

这篇关于图书馆功能使用非标准评估时的Refactor R代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆