过时的数据掩码.在`dplyr::mutate()`结束后解析`xxxxxx`为时已晚 [英] Obsolete data mask. Too late to resolve `xxxxxx` after the end of `dplyr::mutate()`
问题描述
作为我对 这篇文章,我提出了一种完全通用的机制,通过该机制可以通过存储在另一个数据帧中的条件过滤一个数据帧.OP 叫我出去(该死!)并要求我实施.
As part of my answer to this post, I suggested a completely generic mechanism by which one data frame could be filtered by conditions stored in another. The OP has called me out (damn!) and asked me for an implementation.
我的解决方案要求我在过滤器数据框中存储函数.这是可能的:这篇文章展示了方法.
My solution requires me to store functions in the filter dataframe. This is possible: this post shows how.
作为一个基本的例子,考虑
As a basic example, consider
library(tidyverse)
longFilterTable <- tribble(
~var, ~value,
"gear", list(3),
) %>%
mutate(
func=pmap(
list(value),
~function(x) x == ..1[[1]]
)
)
longFilterTable
# A tibble: 1 x 3
var value func
<chr> <list> <list>
1 gear <list [1]> <fn>
这是一种非常复杂的说法,仅选择那些 gear
为 3
的行(mtcars
).这有效:
This is a very convoluted way of saying "select only those rows (of mtcars
) for which gear
is 3
. This works:
mtcars %>% filter(longFilterTable$func[[1]](gear)) %>% head(3)
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
<11 rows deleted for brevity>
现在假设我希望标准具有更大的灵活性.例如,我可能想要选择一个值范围或一个固定值.这似乎是上面过滤器数据集的合理扩展:
Now suppose I want more flexibility in the criterion. I might, for example, want to select either a range of values or a fixed value. This seems to be a reasonable extension of the filter dataset above:
longFilterTable <- tribble(
~var, ~value, ~condition,
"gear", list(3), "equal",
"wt", list(3,4, 3.9), "range",
) %>%
mutate(
func=pmap(
list(value, condition),
~function(x) {
case_when(
condition == "equal" ~ x == ..1[[1]],
condition == "range" ~ x >= ..1[[1]][1] & x <= ..1[[1]][2],
TRUE ~ x
)
}
)
)
longFilterTable
# A tibble: 2 x 4
var value condition func
<chr> <list> <chr> <list>
1 gear <list [1]> equal <fn>
2 wt <list [3]> range <fn>
但是现在当我尝试应用过滤器时,我得到:
But now when I try to apply the filter, I get:
mtcars %>% filter(longFilterTable$func[[1]](gear))
Show Traceback
Rerun with Debug
Error: Problem with `filter()` input `..1`.
x Obsolete data mask.
x Too late to resolve `condition` after the end of `dplyr::mutate()`.
ℹ Did you save an object that uses `condition` lazily in a column in the `dplyr::mutate()` expression ?
ℹ Input `..1` is `longFilterTable$func[[1]](gear)`.
我玩过 deparse()
、substitute()
、expression()
、force()
和 eval()
,但无济于事.谁能找到解决办法?
I've played around with various combinations of deparse()
, substitute()
, expression()
, force()
and eval()
, but to no avail. Can anyone find a solution?
推荐答案
你的问题是 case_when
的所有选项总是被评估和检查正确的输出格式
Your problem is that all options of case_when
are always evaluated and checked for correct output format
x <- 1
dplyr::case_when(x < 2 ~ TRUE,
x < 0 ~ FALSE)
#> [1] TRUE
dplyr::case_when(x < 2 ~ TRUE,
x < 0 ~ stop())
#> Error in eval_tidy(pair$rhs, env = default_env):
在您的情况下,您想使用第一个选项,检查是否相等.然而,范围条件也被评估,但没有第二个值存储在 value
列表中,结果只是一个 NA
的向量,因此错误.从 case_when
切换到常规 if else 子句可以解决这个问题.
In your case, you want to use the first option, checking for equality. However, the range condition is also evaluated yet no second value is stored in the value
list, the outcome is an vector of NA
s only, hence the error. Switching from case_when
to a regular if else clause solves this issue.
library(purrr)
library(dplyr)
longFilterTable <- tribble(
~var, ~value, ~condition,
"gear", list(3), "equal",
"wt", list(3.4, 3.9), "range",
) %>%
mutate(
func=pmap(
list(value, condition),
~function(x) {
if(..2 == "equal") x == ..1[[1]]
else if (..2 == "range") x >= ..1[[1]] & x <= ..1[[2]]
else TRUE
}
)
)
mtcars %>% filter(longFilterTable$func[[2]](drat))
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
这篇关于过时的数据掩码.在`dplyr::mutate()`结束后解析`xxxxxx`为时已晚的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!