为什么R中的循环慢? [英] Why are loops slow in R?

查看:295
本文介绍了为什么R中的循环慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道R中的循环很慢,我应该尝试以向量化的方式进行操作.

I know that loops are slow in R and that I should try to do things in a vectorised manner instead.

但是,为什么呢?为什么循环慢而apply快? apply调用了几个子功能-看起来并不快.

But, why? Why are loops slow and apply is fast? apply calls several sub-functions -- that doesn't seem fast.

更新:很抱歉,这个问题不适当地提出.我将向量化与apply混淆了.我的问题应该是

Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply. My question should have been,

为什么矢量化更快?"

推荐答案

R的循环速度很慢,原因与任何解释性语言均较慢的原因相同: 操作会带来很多额外的负担.

Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.

查看eval.c 中的 R_execClosure(这是调用以调用一个函数 用户定义的函数).它接近100行,可以执行各种 操作-创建执行环境,将参数分配给 环境等.

Look at R_execClosure in eval.c (this is the function called to call a user-defined function). It's nearly 100 lines long and performs all sorts of operations -- creating an environment for execution, assigning arguments into the environment, etc.

想想当您在C语言中调用一个函数时,发生的事情要少得多(将args推送到 堆栈,跳转,弹出参数).

Think how much less happens when you call a function in C (push args on to stack, jump, pop args).

这就是为什么您会得到这样的时间安排(如joran在评论中指出的那样, 实际上不是apply这么快.这是mean中的内部C循环 很快. apply只是常规的旧R代码):

So that is why you get timings like these (as joran pointed out in the comment, it's not actually apply that's being fast; it's the internal C loop in mean that's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

使用循环:0.342秒:

Using a loop: 0.342 seconds:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

使用总和:不可估量的小:

Using sum: unmeasurably small:

sum(A)

这有点令人不安,因为从渐近来看,循环同样好 作为sum;没有实际的原因,它应该很慢;它只是在做更多 每次迭代都需要额外的工作.

It's a little disconcerting because, asymptotically, the loop is just as good as sum; there's no practical reason it should be slow; it's just doing more extra work each iteration.

所以请考虑:

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

(该示例由 Radford Neal 发现)

由于R中的(是运算符,因此每次使用它实际上都需要进行名称查找:

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2
> (3)
[1] 2

或者,通常,解释操作(以任何语言进行)都需要更多步骤.当然,这些步骤也能带来好处:您无法做到用C语言编写的(技巧.

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.

这篇关于为什么R中的循环慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆