为什么R中的循环慢? [英] Why are loops slow in R?
问题描述
我知道R
中的循环很慢,我应该尝试以向量化的方式进行操作.
I know that loops are slow in R
and that I should try to do things in a vectorised manner instead.
但是,为什么呢?为什么循环慢而apply
快? apply
调用了几个子功能-看起来并不快.
But, why? Why are loops slow and apply
is fast? apply
calls several sub-functions -- that doesn't seem fast.
更新:很抱歉,这个问题不适当地提出.我将向量化与apply
混淆了.我的问题应该是
Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply
. My question should have been,
为什么矢量化更快?"
推荐答案
R的循环速度很慢,原因与任何解释性语言均较慢的原因相同: 操作会带来很多额外的负担.
Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.
查看eval.c
中的 R_execClosure
(这是调用以调用一个函数
用户定义的函数).它接近100行,可以执行各种
操作-创建执行环境,将参数分配给
环境等.
Look at R_execClosure
in eval.c
(this is the function called to call a
user-defined function). It's nearly 100 lines long and performs all sorts of
operations -- creating an environment for execution, assigning arguments into
the environment, etc.
想想当您在C语言中调用一个函数时,发生的事情要少得多(将args推送到 堆栈,跳转,弹出参数).
Think how much less happens when you call a function in C (push args on to stack, jump, pop args).
这就是为什么您会得到这样的时间安排(如joran在评论中指出的那样,
实际上不是apply
这么快.这是mean
中的内部C循环
很快. apply
只是常规的旧R代码):
So that is why you get timings like these (as joran pointed out in the comment,
it's not actually apply
that's being fast; it's the internal C loop in mean
that's being fast. apply
is just regular old R code):
A = matrix(as.numeric(1:100000))
使用循环:0.342秒:
Using a loop: 0.342 seconds:
system.time({
Sum = 0
for (i in seq_along(A)) {
Sum = Sum + A[[i]]
}
Sum
})
使用总和:不可估量的小:
Using sum: unmeasurably small:
sum(A)
这有点令人不安,因为从渐近来看,循环同样好
作为sum
;没有实际的原因,它应该很慢;它只是在做更多
每次迭代都需要额外的工作.
It's a little disconcerting because, asymptotically, the loop is just as good
as sum
; there's no practical reason it should be slow; it's just doing more
extra work each iteration.
所以请考虑:
# 0.370 seconds
system.time({
I = 0
while (I < 100000) {
10
I = I + 1
}
})
# 0.743 seconds -- double the time just adding parentheses
system.time({
I = 0
while (I < 100000) {
((((((((((10))))))))))
I = I + 1
}
})
(该示例由 Radford Neal 发现)
由于R中的(
是运算符,因此每次使用它实际上都需要进行名称查找:
Because (
in R is an operator, and actually requires a name lookup every time you use it:
> `(` = function(x) 2
> (3)
[1] 2
或者,通常,解释操作(以任何语言进行)都需要更多步骤.当然,这些步骤也能带来好处:您无法做到用C语言编写的(
技巧.
Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that (
trick in C.
这篇关于为什么R中的循环慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!