为什么 R 中的循环很慢? [英] Why are loops slow in R?

查看:29
本文介绍了为什么 R 中的循环很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 R 中的循环很慢,我应该尝试以矢量化的方式做事.

I know that loops are slow in R and that I should try to do things in a vectorised manner instead.

但是,为什么?为什么循环很慢而 apply 很快?apply 调用了几个子函数——这看起来并不快.

But, why? Why are loops slow and apply is fast? apply calls several sub-functions -- that doesn't seem fast.

更新:对不起,这个问题提出的不妥.我将矢量化与 apply 混淆了.我的问题应该是,

Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply. My question should have been,

为什么矢量化速度更快?"

推荐答案

R 中的循环很慢,原因与任何解释型语言都很慢:每个操作会带来很多额外的行李.

Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.

查看eval.c中的R_execClosure(这是调用一个函数用户定义函数).它有近 100 行长并执行各种操作——创建一个执行环境,将参数分配到环境等

Look at R_execClosure in eval.c (this is the function called to call a user-defined function). It's nearly 100 lines long and performs all sorts of operations -- creating an environment for execution, assigning arguments into the environment, etc.

想想当你在 C 中调用一个函数时会发生多少(将 args 推到堆栈、跳转、弹出参数).

Think how much less happens when you call a function in C (push args on to stack, jump, pop args).

所以这就是为什么你会得到这样的时间(正如 joran 在评论中指出的那样,实际上 apply 并不是很快;它是 mean 中的内部 C 循环这很快.apply 只是普通的旧 R 代码):

So that is why you get timings like these (as joran pointed out in the comment, it's not actually apply that's being fast; it's the internal C loop in mean that's being fast. apply is just regular old R code):

A = matrix(as.numeric(1:100000))

使用循环:0.342 秒:

Using a loop: 0.342 seconds:

system.time({
    Sum = 0
    for (i in seq_along(A)) {
        Sum = Sum + A[[i]]
    }
    Sum
})

使用 sum:不可测量的小:

Using sum: unmeasurably small:

sum(A)

这有点令人不安,因为渐近地,循环同样好作为 sum;没有实际的理由它应该很慢;它只是做得更多每次迭代的额外工作.

It's a little disconcerting because, asymptotically, the loop is just as good as sum; there's no practical reason it should be slow; it's just doing more extra work each iteration.

所以考虑:

# 0.370 seconds
system.time({
    I = 0
    while (I < 100000) {
        10
        I = I + 1
    }
})

# 0.743 seconds -- double the time just adding parentheses
system.time({
    I = 0
    while (I < 100000) {
        ((((((((((10))))))))))
        I = I + 1
    }
})

(该示例由 Radford Neal 发现)

因为 ( 在 R 中是一个操作符,实际上每次使用时都需要进行名称查找:

Because ( in R is an operator, and actually requires a name lookup every time you use it:

> `(` = function(x) 2
> (3)
[1] 2

或者,一般来说,解释性操作(在任何语言中)都有更多的步骤.当然,这些步骤也有好处:你不能那个( C 中的技巧.

Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that ( trick in C.

这篇关于为什么 R 中的循环很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆