ifelse每次都真的计算它的两个向量吗?它慢吗? [英] Does ifelse really calculate both of its vectors every time? Is it slow?
问题描述
ifelse
是否真的计算是
和否
向量 - 如同每个向量的整体?
或者它只是从每个向量计算一些值?
Does ifelse
really calculate both the yes
and no
vectors -- as in, the entirety of each vector?
Or does it just calculate some values from each vector?
另外, ifelse
真的那么慢吗?
Also, is ifelse
really that slow?
推荐答案
是的。 (除外)
ifelse
计算其是
值及其否
值。除了 test
条件是全部 TRUE
或所有 FALSE $的情况C $ C>。
Yes. (With exception)
ifelse
calculates both its yes
value and its no
value. Except in the case where the test
condition is either all TRUE
or all FALSE
.
我们可以通过生成随机数并观察实际生成的数量来看到这一点。 (通过还原种子
)。
We can see this by generating random numbers and observing how many numbers are actually generated. (by reverting the seed
).
# TEST CONDITION, ALL TRUE
set.seed(1)
dump <- ifelse(rep(TRUE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.true <- rnorm(1)
# TEST CONDITION, ALL FALSE
set.seed(1)
dump <- ifelse(rep(FALSE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.false <- rnorm(1)
# TEST CONDITION, MIXED
set.seed(1)
dump <- ifelse(c(FALSE, rep(TRUE, 199)), rnorm(200), rnorm(200))
next.random.number.after.some.TRUE.some.FALSE <- rnorm(1)
# RESET THE SEED, GENERATE SEVERAL RANDOM NUMBERS TO SEARCH FOR A MATCH
set.seed(1)
r.1000 <- rnorm(1000)
cat("Quantity of random numbers generated during the `ifelse` statement when:",
"\n\tAll True ", which(r.1000 == next.random.number.after.all.true) - 1,
"\n\tAll False ", which(r.1000 == next.random.number.after.all.false) - 1,
"\n\tMixed T/F ", which(r.1000 == next.random.number.after.some.TRUE.some.FALSE) - 1
)
给出以下输出:
Quantity of random numbers generated during the `ifelse` statement when:
All True 200
All False 200
Mixed T/F 400 <~~ Notice TWICE AS MANY numbers were
generated when `test` had both
T & F values present
我们也可以在源代码本身:
We can also see it in the source code itself:
.
.
if (any(test[!nas]))
ans[test & !nas] <- rep(yes, length.out = length(ans))[test & # <~~~~ This line and the one below
!nas]
if (any(!test[!nas]))
ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & # <~~~~ ... are the cluprits
!nas]
.
.
请注意是
和 no
仅在
为的某些非
值
test $ c时计算$ c>即
TRUE
或 FALSE
(分别)。
此时 - 当涉及到效率时,这是一个重要的部分 - 计算每个向量的整体 。
Notice that yes
and no
are computed only if there
is some non-NA
value of test
that is TRUE
or FALSE
(respectively).
At which point -- and this is the imporant part when it comes to efficiency -- the entirety of each vector is computed.
让我们看看如果我们可以测试它:
Lets see if we can test it:
library(microbenchmark)
# Create some sample data
N <- 1e4
set.seed(1)
X <- sample(c(seq(100), rep(NA, 100)), N, TRUE)
Y <- ifelse(is.na(X), rnorm(X), NA) # Y has reverse NA/not-NA setup than X
这两个语句产生相同的结果
These two statements generate the same results
yesifelse <- quote(sort(ifelse(is.na(X), Y+17, X-17 ) ))
noiflese <- quote(sort(c(Y[is.na(X)]+17, X[is.na(Y)]-17)))
identical(eval(yesifelse), eval(noiflese))
# [1] TRUE
但是一个是另一个的两倍
but one is twice as fast as the other
microbenchmark(eval(yesifelse), eval(noiflese), times=50L)
N = 1,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 2.286621 2.348590 2.411776 2.537604 10.05973 50
eval(noiflese) 1.088669 1.093864 1.122075 1.149558 61.23110 50
N = 10,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 30.32039 36.19569 38.50461 40.84996 98.77294 50
eval(noiflese) 12.70274 13.58295 14.38579 20.03587 21.68665 50
这篇关于ifelse每次都真的计算它的两个向量吗?它慢吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!