dplyr case_when的data.table替代 [英] data.table alternative for dplyr case_when
问题描述
不久前,他们在 dplyr
中引入了一个类似于SQL的漂亮替代方法,代替了 ifelse
,即 case_when
。
Some time ago they introduced a nice SQL-like alternative to ifelse
within dplyr
, i.e. case_when
.
data.table
中是否存在等效项,您要在一个 []
语句中指定不同的条件,而不加载其他程序包?
Is there an equivalent in data.table
that would allow you to specify different conditions within one []
statement, without loading additional packages?
示例:
library(dplyr)
df <- data.frame(a = c("a", "b", "a"), b = c("b", "a", "a"))
df <- df %>% mutate(
new = case_when(
a == "a" & b == "b" ~ "c",
a == "b" & b == "a" ~ "d",
TRUE ~ "e")
)
a b new
1 a b c
2 b a d
3 a a e
这肯定会很有帮助,并使代码更具可读性(我一直使用 dplyr
在这些情况下)。
It would certainly be very helpful and make code much more readable (one of the reasons why I keep using dplyr
in these cases).
推荐答案
FYI,针对这篇博文的最新答案2019。data.table的最新开发版本具有 fcase
函数,正是为此提供了功能。实现:
FYI, a more recent answer for those coming across this post 2019. The most recent development version of data.table has the fcase
function that provides for exactly that. Implementation:
# Lazy evaluation
x = 1:10
dplyr::case_when(
x < 5L ~ 1L,
x >= 5L ~ 3L,
x == 5L ~ stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3
data.table::fcase(
x < 5L, 1L,
x >= 5L, 3L,
x == 5L, stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
# provided value is an unexpected one!
# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
x < 10L ~ 0L,
x < 20L ~ 10L,
x < 30L ~ 20L,
x < 40L ~ 30L,
x < 50L ~ 40L,
x < 60L ~ 50L,
x > 60L ~ 60L
),
data.table::fcase(
x < 10L, 0L,
x < 20L, 10L,
x < 30L, 20L,
x < 40L, 30L,
x < 50L, 40L,
x < 60L, 50L,
x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
# expr min lq mean median uq max neval
# dplyr::case_when 11.57 11.71 12.22 11.82 12.00 14.02 5
# data.table::fcase 1.49 1.55 1.67 1.71 1.73 1.86 5
来源,在 data.table v1.12.9(正在开发)下。应该会在2020年1月发布。
Source, under "data.table v1.12.9 (in development)". Should be released soon, likely in January 2020.
这篇关于dplyr case_when的data.table替代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!