dplyr 中基于字符串的过滤 - NSE [英] String based filtering in dplyr - NSE

查看:29
本文介绍了dplyr 中基于字符串的过滤 - NSE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对我的数据使用 dplyr 的新 NSE 符号(版本 >= 0.6)作为动态 filter.假设我有以下虚拟数据集:

df = data_frame(x = 1:10, y = 10:1, z = 10 * runif(10))

如果现在我想过滤列 tofilter = "x" 的值大于 5 我知道我可以这样做:

df %>%filter((!!rlang::sym(tofilter)) >= 5)

问题 1

如果我也想动态更改过滤的运算符怎么办(假设我有一个 Shiny App,用户可以在其中动态 selectInput 如果要过滤大于 5 的值的数据,等于到 5 或低于 5?

我想做的是:

op = ">="价值 = 5filt_expr = paste("x", op, val)df%>%过滤器(filt_expr)

显然,这行不通,我玩过rlang quosore/symbols 等,但没有找到正确的方法来引用"我的输入.

问题 2

额外的问题是,如果我想应用多个过滤器怎么办?我需要循环还是我可以创建一个过滤表达式列表并一次性应用它们?

一个例子是一个 Shiny 应用程序,用户可以在其中输入他/她想要应用于数据的多个条件,以便我们有一个动态变化的格式列表:

filt_expr_list = list("x >= 5", "y <= 10", "z >= 2")

并且我们希望动态地应用它们,以便输出等效于:

df %>%过滤器(x >= 5,y <= 10,z >= 2)

我想这在某种意义上是问题 1 的一个子集,因为当我知道如何正确引用参数时,我想我可以这样做:

filt_expr = paste0(unlist(filt_expr_list), collapse = ", ")df%>%过滤器(filt_expr)

但是很高兴看看是否有更好的清洁方法

解决方案

如果我也想动态改变过滤的操作符怎么办

您可以使用 tidy eval 通过取消引用表示运算符的符号来实现(注意,我使用 expr() 来说明取消引用的结果):

lhs <- "foo"# 在 `op` 中存储符号 `<`op <- 引用(`<`)expr(`!!`(op)(!!sym(lhs), 5))#>foo <5

但是,在使用常规 R 代码的 tidy eval 之外执行它会更干净.仅当您取消引用的符号代表数据框中的一列(即不在上下文中的内容)时,才需要取消引用.在这里,您可以将运算符存储在变量中,然后在过滤表达式中调用该变量:

# 在 `op` 中存储函数 `<`操作 <-`<`expr(op(!!sym(lhs), 5))#>操作(富,5)

<块引用>

如果我想应用多个过滤器怎么办?

您将表达式保存在列表中,然后在调用中使用 !!! 将它们拼接起来:

过滤器 <- 列表(报价(x >= 5),报价(y <= 10),报价(z >= 2))expr(df %>% 过滤器(!!!过滤器))#>df %>% filter(x >= 5, y <= 10, z >= 2)`

注意:我在上面说过没有必要从上下文中取消引用变量,但是如果您正在编写具有数据框的函数,这样做通常仍然是一个好主意作为输入.由于数据框是可变的,因此您事先不知道它包含哪些列.列将始终优先于您在环境中定义的对象.在本例中,这不是问题,因为我们讨论的是一个函数,如果 R 在数据框中找到一个类似命名的对象,它会继续寻找一个函数.

I'd like to use dplyr's new NSE notations (version >= 0.6) for a dynamic filter on my data. Let's say I have the following dummy dataset:

df = data_frame(x = 1:10, y = 10:1, z = 10 * runif(10))

If now I want to filter column tofilter = "x" for values greater than 5 I know I can do:

df %>% 
  filter((!!rlang::sym(tofilter)) >= 5)

Question 1

What if I want to dynamically change the operator of the filtering too (let's say I have a Shiny App in which the user can dynamically selectInput if to filter the data for values greater than 5, equal to 5 or lower than 5?

What I'd like to do is something on the line of:

op = ">="
val = 5
filt_expr = paste("x", op, val)
df %>% 
  filter(filt_expr)

Obviously, this does not work and I have played a bit with the rlang quosore/symbols, etc but didn't quite find the right way to "quote" my inputs.

Question 2

Bonus question is, what if I want to apply multiple filters? Do I need to loop or I can create a list of filtering expressions and apply them all in one go?

An example of this is a Shiny App where the user can type multiple conditions he/she wants to apply to the data so that we have a dynamically changing list of the format:

filt_expr_list = list("x >= 5", "y <= 10", "z >= 2")

and we want to dynamically apply them all, so that the output is equivalent to:

df %>%
  filter(x >= 5, y <= 10, z >= 2)

I guess this is in a certain sense a subset of question 1 since when I know how to correctly quote the arguments I think I could do something like:

filt_expr = paste0(unlist(filt_expr_list), collapse = ", ")
df %>%
  filter(filt_expr)

but would be nice to see if there is any nicer cleaner way

解决方案

What if I want to dynamically change the operator of the filtering too

You can do it with tidy eval by unquoting a symbol representing the operator (note that I use expr() to illustrate the result of the unquoting):

lhs <- "foo"

# Storing the symbol `<` in `op`
op <- quote(`<`)

expr(`!!`(op)(!!sym(lhs), 5))
#> foo < 5

However it is cleaner to do it outside tidy eval with regular R code. Unquoting is only necessary when the symbol you unquote represents a column from the data frame, i.e. something that's not in the context. Here you can just store the operator in a variable and then call that variable in your filtering expression:

# Storing the function `<` in `op`
op <- `<`

expr(op(!!sym(lhs), 5))
#> op(foo, 5)

what if I want to apply multiple filters?

You save the expressions in a list and then you splice them in a call with !!!:

filters <- list(
  quote(x >= 5),
  quote(y <= 10),
  quote(z >= 2)
)

expr(df %>% filter(!!!filters))
#> df %>% filter(x >= 5, y <= 10, z >= 2)`

Note: I said above that it is not necessary to unquote variable from the context, but it is still often a good idea to do so if you're writing a function that has the data frame as input. Since the data frame is variable, you don't know in advance what columns it contains. The columns will always have precedence over the objects you have defined in the environment. In the case here, this is not an issue because we are talking about a function and R will keep looking for a function if it finds a similarly named object in the data frame.

这篇关于dplyr 中基于字符串的过滤 - NSE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆