ifelse每次都真的计算它的两个向量吗?它慢吗? [英] Does ifelse really calculate both of its vectors every time? Is it slow?

查看:161
本文介绍了ifelse每次都真的计算它的两个向量吗?它慢吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ifelse 是否真的计算向量 - 如同每个向量的整体?
或者它只是从每个向量计算一些值?

Does ifelse really calculate both the yes and no vectors -- as in, the entirety of each vector? Or does it just calculate some values from each vector?

另外, ifelse 真的那么慢吗?

Also, is ifelse really that slow?

推荐答案

是的。 (除外)



ifelse 计算其值及其值。除了 test 条件是全部 TRUE 或所有 FALSE

Yes. (With exception)

ifelse calculates both its yes value and its no value. Except in the case where the test condition is either all TRUE or all FALSE.

我们可以通过生成随机数并观察实际生成的数量来看到这一点。 (通过还原种子)。

We can see this by generating random numbers and observing how many numbers are actually generated. (by reverting the seed).

# TEST CONDITION, ALL TRUE
set.seed(1)
dump  <- ifelse(rep(TRUE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.true <- rnorm(1)

# TEST CONDITION, ALL FALSE
set.seed(1)
dump  <- ifelse(rep(FALSE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.false <- rnorm(1)

# TEST CONDITION, MIXED
set.seed(1)
dump   <- ifelse(c(FALSE, rep(TRUE, 199)), rnorm(200), rnorm(200))
next.random.number.after.some.TRUE.some.FALSE <- rnorm(1)

# RESET THE SEED, GENERATE SEVERAL RANDOM NUMBERS TO SEARCH FOR A MATCH
set.seed(1)
r.1000 <- rnorm(1000)


cat("Quantity of random numbers generated during the `ifelse` statement when:", 
    "\n\tAll True  ", which(r.1000 == next.random.number.after.all.true) - 1,
    "\n\tAll False ", which(r.1000 == next.random.number.after.all.false) - 1,
    "\n\tMixed T/F ", which(r.1000 == next.random.number.after.some.TRUE.some.FALSE) - 1 
  )

给出以下输出:

Quantity of random numbers generated during the `ifelse` statement when: 
  All True   200 
  All False  200 
  Mixed T/F  400   <~~ Notice TWICE AS MANY numbers were
                       generated when `test` had both
                       T & F values present






我们也可以在源代码本身:




We can also see it in the source code itself:

.
.
if (any(test[!nas]))    
    ans[test & !nas] <- rep(yes, length.out = length(ans))[test &   # <~~~~ This line and the one below
        !nas]
if (any(!test[!nas])) 
    ans[!test & !nas] <- rep(no, length.out = length(ans))[!test &  # <~~~~ ... are the cluprits
        !nas]
.
.

请注意 no 仅在
的某些非 test TRUE FALSE (分别)。

此时 - 当涉及到效率时,这是一个重要的部分 - 计算每个向量的整体

Notice that yes and no are computed only if there is some non-NA value of test that is TRUE or FALSE (respectively).
At which point -- and this is the imporant part when it comes to efficiency -- the entirety of each vector is computed.

让我们看看如果我们可以测试它:

Lets see if we can test it:

library(microbenchmark)

# Create some sample data
  N <- 1e4
  set.seed(1)
  X <- sample(c(seq(100), rep(NA, 100)), N, TRUE)
  Y <- ifelse(is.na(X), rnorm(X), NA)  # Y has reverse NA/not-NA setup than X



这两个语句产生相同的结果



These two statements generate the same results

yesifelse <- quote(sort(ifelse(is.na(X), Y+17, X-17 ) ))
noiflese  <- quote(sort(c(Y[is.na(X)]+17, X[is.na(Y)]-17)))

identical(eval(yesifelse), eval(noiflese))
# [1] TRUE



但是一个是另一个的两倍



but one is twice as fast as the other

microbenchmark(eval(yesifelse), eval(noiflese), times=50L)

N = 1,000
Unit: milliseconds
            expr      min       lq   median       uq      max neval
 eval(yesifelse) 2.286621 2.348590 2.411776 2.537604 10.05973    50
  eval(noiflese) 1.088669 1.093864 1.122075 1.149558 61.23110    50

N = 10,000
Unit: milliseconds
            expr      min       lq   median       uq      max neval
 eval(yesifelse) 30.32039 36.19569 38.50461 40.84996 98.77294    50
  eval(noiflese) 12.70274 13.58295 14.38579 20.03587 21.68665    50

这篇关于ifelse每次都真的计算它的两个向量吗?它慢吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆