dplyr :: if_else是否同时评估TRUE和FALSE? [英] Does dplyr::if_else evaluate both TRUE and FALSE at the same time?
问题描述
考虑以下示例:
library(dplyr)
# sample data
set.seed(1)
mydf <- data.frame(value = as.logical(sample(0:1, 15, replace = TRUE)), group = rep(letters[1:3],each = 5), index = 1:5)
# finds either index of first "TRUE" value by group, or the last value.
# works with base::ifelse
mydf %>% group_by(group) %>% mutate(max_value = ifelse(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 5
#> 12 FALSE c 2 5
#> 13 FALSE c 3 5
#> 14 FALSE c 4 5
#> 15 FALSE c 5 5
# the same gives a warning with dplyr::if_else
mydf %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))
#> Warning in min(which(value)): no non-missing arguments to min; returning Inf
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 5
#> 12 FALSE c 2 5
#> 13 FALSE c 3 5
#> 14 FALSE c 4 5
#> 15 FALSE c 5 5
如代码中所注释- dplyr :: if_else
确实会导致警告
As commented in the code - dplyr::if_else
does result in the warning
以min(which(value))表示的警告:min没有不可缺少的参数;返回Inf
Warning in min(which(value)): no non-missing arguments to min; returning Inf
删除所有FALSE"组c-不再发出警告:
Removing the "all FALSE" group c - no warning any more:
mydf_allTRUE <- mydf
mydf_allTRUE[14, 'value'] <- TRUE
mydf_allTRUE %>% group_by(group) %>% mutate(max_value = if_else(all(!value), max(index), index[min(which(value))]))
#> # A tibble: 15 x 4
#> # Groups: group [3]
#> value group index max_value
#> <lgl> <fct> <int> <int>
#> 1 FALSE a 1 2
#> 2 TRUE a 2 2
#> 3 FALSE a 3 2
#> 4 FALSE a 4 2
#> 5 TRUE a 5 2
#> 6 FALSE b 1 4
#> 7 FALSE b 2 4
#> 8 FALSE b 3 4
#> 9 TRUE b 4 4
#> 10 TRUE b 5 4
#> 11 FALSE c 1 4
#> 12 FALSE c 2 4
#> 13 FALSE c 3 4
#> 14 TRUE c 4 4
#> 15 FALSE c 5 4
由 reprex软件包(v0.3.0)于2019年12月22日创建sup>
Created on 2019-12-22 by the reprex package (v0.3.0)
让我感到困惑的是(我相信)我以 FALSE
部分( index [min(which(value))
)必须包含一个值.为什么这会发出警告?这是有问题的,因为我有成千上万个组的数据,并且大多数数据都在"FALSE"位中,并且警告使计算极其缓慢.
What confuses me, is that (I believe that) I constructed the TRUE
part in a way that the FALSE
part (index[min(which(value))]
) must contain a value. Why does this then give a warning?
It is problematic, because I have data with several thousand groups and most of them are in the "FALSE" bit and the warnings make the computation extremely slow.
我很高兴使用 base :: ifelse
,但是我只是想知道 dplyr :: if_else
是如何同时评估TRUE和FALSE方面的,这是否在某种程度上是相同的时间?
I am happy to use base::ifelse
, but I just wondered how dplyr::if_else
is evaluating both TRUE and FALSE sides, is this somehow at the same time?
推荐答案
问题是因为我们正在检查以下情况:有些组返回的 NULL与
which(value)`
The issue is because we are checking cases where there are groups that return NULL with
which(value)`
min(NULL)
#[1] Inf
警告消息:在min(NULL)中:min没有非丢失的参数;返回Inf
Warning message: In min(NULL) : no non-missing arguments to min; returning Inf
一个选项是通过使用 [1]
进行索引使哪个
输出,以返回 NA
An option is to subject the which
output by indexing with [1]
to return NA
mydf %>%
group_by(group) %>%
mutate(max_value = if_else(all(!value), max(index), index[which(value)[1]]))
# A tibble: 15 x 4
# Groups: group [3]
# value group index max_value
# <lgl> <fct> <int> <int>
# 1 FALSE a 1 2
# 2 TRUE a 2 2
# 3 FALSE a 3 2
# 4 FALSE a 4 2
# 5 TRUE a 5 2
# 6 FALSE b 1 4
# 7 FALSE b 2 4
# 8 FALSE b 3 4
# 9 TRUE b 4 4
#10 TRUE b 5 4
#11 FALSE c 1 5
#12 FALSE c 2 5
#13 FALSE c 3 5
#14 FALSE c 4 5
#15 FALSE c 5 5
在这种情况下,由于我们要返回单个元素,因此 if/else
会更合适
mydf %>%
group_by(group) %>%
mutate(max_value = if(all(!value)) max(index) else index[which(value)[1]])
# A tibble: 15 x 4
# Groups: group [3]
# value group index max_value
# <lgl> <fct> <int> <int>
# 1 FALSE a 1 2
# 2 TRUE a 2 2
# 3 FALSE a 3 2
# 4 FALSE a 4 2
# 5 TRUE a 5 2
# 6 FALSE b 1 4
# 7 FALSE b 2 4
# 8 FALSE b 3 4
# 9 TRUE b 4 4
#10 TRUE b 5 4
#11 FALSE c 1 5
#12 FALSE c 2 5
#13 FALSE c 3 5
#14 FALSE c 4 5
#15 FALSE c 5 5
这篇关于dplyr :: if_else是否同时评估TRUE和FALSE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!