为什么 R 中的循环很慢? [英] Why are loops slow in R?
问题描述
我知道 R
中的循环很慢,我应该尝试以矢量化的方式做事.
I know that loops are slow in R
and that I should try to do things in a vectorised manner instead.
但是,为什么?为什么循环很慢而 apply
很快?apply
调用了几个子函数——这看起来并不快.
But, why? Why are loops slow and apply
is fast? apply
calls several sub-functions -- that doesn't seem fast.
更新:对不起,这个问题提出的不妥.我将矢量化与 apply
混淆了.我的问题应该是,
Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply
. My question should have been,
为什么矢量化速度更快?"
推荐答案
R 中的循环很慢,原因与任何解释型语言都很慢:每个操作会带来很多额外的行李.
Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.
查看eval.c中的
R_execClosure
(这是调用一个函数用户定义函数).它有近 100 行长并执行各种操作——创建一个执行环境,将参数分配到环境等
Look at R_execClosure
in eval.c
(this is the function called to call a
user-defined function). It's nearly 100 lines long and performs all sorts of
operations -- creating an environment for execution, assigning arguments into
the environment, etc.
想想当你在 C 中调用一个函数时会发生多少(将 args 推到堆栈、跳转、弹出参数).
Think how much less happens when you call a function in C (push args on to stack, jump, pop args).
所以这就是为什么你会得到这样的时间(正如 joran 在评论中指出的那样,实际上 apply
并不是很快;它是 mean
中的内部 C 循环这很快.apply
只是普通的旧 R 代码):
So that is why you get timings like these (as joran pointed out in the comment,
it's not actually apply
that's being fast; it's the internal C loop in mean
that's being fast. apply
is just regular old R code):
A = matrix(as.numeric(1:100000))
使用循环:0.342 秒:
Using a loop: 0.342 seconds:
system.time({
Sum = 0
for (i in seq_along(A)) {
Sum = Sum + A[[i]]
}
Sum
})
使用 sum:不可测量的小:
Using sum: unmeasurably small:
sum(A)
这有点令人不安,因为渐近地,循环同样好作为 sum
;没有实际的理由它应该很慢;它只是做得更多每次迭代的额外工作.
It's a little disconcerting because, asymptotically, the loop is just as good
as sum
; there's no practical reason it should be slow; it's just doing more
extra work each iteration.
所以考虑:
# 0.370 seconds
system.time({
I = 0
while (I < 100000) {
10
I = I + 1
}
})
# 0.743 seconds -- double the time just adding parentheses
system.time({
I = 0
while (I < 100000) {
((((((((((10))))))))))
I = I + 1
}
})
(该示例由 Radford Neal 发现)
因为 (
在 R 中是一个操作符,实际上每次使用时都需要进行名称查找:
Because (
in R is an operator, and actually requires a name lookup every time you use it:
> `(` = function(x) 2
> (3)
[1] 2
或者,一般来说,解释性操作(在任何语言中)都有更多的步骤.当然,这些步骤也有好处:你不能做那个(
C 中的技巧.
Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that (
trick in C.
这篇关于为什么 R 中的循环很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!