dplyr case_when 的 data.table 替代方案 [英] data.table alternative for dplyr case_when

查看:33
本文介绍了dplyr case_when 的 data.table 替代方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

前一段时间,他们在 dplyr 中引入了一个很好的类似 SQL 的替代 ifelse,即 case_when.

Some time ago they introduced a nice SQL-like alternative to ifelse within dplyr, i.e. case_when.

data.table 中是否有等价物允许您在一个 [] 语句中指定不同的条件,而无需加载额外的包?

Is there an equivalent in data.table that would allow you to specify different conditions within one [] statement, without loading additional packages?

示例:

library(dplyr)

df <- data.frame(a = c("a", "b", "a"), b = c("b", "a", "a"))

df <- df %>% mutate(
    new = case_when(
    a == "a" & b == "b" ~ "c",
    a == "b" & b == "a" ~ "d",
    TRUE ~ "e")
    )

  a b new
1 a b   c
2 b a   d
3 a a   e

这肯定会很有帮助,并使代码更具可读性(这是我在这些情况下继续使用 dplyr 的原因之一).

It would certainly be very helpful and make code much more readable (one of the reasons why I keep using dplyr in these cases).

推荐答案

仅供参考,这是针对 2019 年发布的这篇文章的最新答案.data.table 1.13.0 以上的版本具有 fcase 可以使用的函数.请注意,它不是 dplyr::case_when 的替代替代品,因为语法不同,但将是本机"代码.data.table 计算方式.

FYI, a more recent answer for those coming across this post 2019. data.table versions above 1.13.0 have the fcase function that can be used. Note that it is not a drop-in replacement for dplyr::case_when as the syntax is different, but will be a "native" data.table way of calculation.

# Lazy evaluation
x = 1:10
data.table::fcase(
    x < 5L, 1L,
    x >= 5L, 3L,
    x == 5L, stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3

dplyr::case_when(
    x < 5L ~ 1L,
    x >= 5L ~ 3L,
    x == 5L ~ stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
#  provided value is an unexpected one!

# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
  x < 10L ~ 0L,
  x < 20L ~ 10L,
  x < 30L ~ 20L,
  x < 40L ~ 30L,
  x < 50L ~ 40L,
  x < 60L ~ 50L,
  x > 60L ~ 60L
),
data.table::fcase(
  x < 10L, 0L,
  x < 20L, 10L,
  x < 30L, 20L,
  x < 40L, 30L,
  x < 50L, 40L,
  x < 60L, 50L,
  x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
#               expr   min    lq  mean   median    uq    max neval
# dplyr::case_when   11.57 11.71 12.22    11.82 12.00  14.02     5
# data.table::fcase   1.49  1.55  1.67     1.71  1.73   1.86     5

来源,1.13.0 的数据表新闻,发布(2020 年 7 月 24 日).

Source, data.table NEWS for 1.13.0, released (24 Jul 2020).

这篇关于dplyr case_when 的 data.table 替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆