“应用”的优点是什么,功能?何时使用比“for”更好。循环,它们什么时候不? [英] What are the advantages of the "apply" functions? When are they better to use than "for" loops, and when are they not?

查看:173
本文介绍了“应用”的优点是什么,功能?何时使用比“for”更好。循环,它们什么时候不?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能重复:

R是适用于家庭而不是语法糖

只是标题说的。愚蠢的问题,也许,但我的理解是,当使用应用函数,迭代在编译代码而不是在R解析器中执行。这似乎意味着lapply例如,如果存在大量迭代并且每个操作相对简单,则仅比for循环更快。例如,如果对一个包裹在lapply中的函数的单个调用需要10秒,并且只有12次迭代,我想象在使用for和lapply之间几乎没有什么区别。

Just what the title says. Stupid question, perhaps, but my understanding has been that when using an "apply" function, the iteration is performed in compiled code rather than in the R parser. This would seem to imply that lapply, for instance, is only faster than a "for" loop if there are a great many iterations and each operation is relatively simple. For instance, if a single call to a function wrapped up in lapply takes 10 seconds, and there are only, say, 12 iterations of it, I would imagine that there's virtually no difference at all between using "for" and "lapply".

现在我想起来了,如果lapply里面的函数不得不解析,为什么使用lapply而不是for,除非你做的事情有编译函数(如求和或乘法等)?

Now that I think of it, if the function inside the "lapply" has to be parsed anyway, why should there be ANY performance benefit from using "lapply" instead of "for" unless you're doing something that there are compiled functions for (like summing or multiplying, etc)?

提前感谢!

Josh

推荐答案

有一些原因,为什么可能喜欢应用家庭函数循环,反之亦然。

There are several reasons why one might prefer an apply family function over a for loop, or vice-versa.

首先, for() apply ) sapply()通常会和正常一样快。 lapply()做的更多的是在R内部编译的代码中操作比其他,所以可以比那些功能更快。看来速度的优势是最大的,当循环的数据是计算时间的重要部分;在许多一般的日常使用中,你不太可能从本质上更快的 lapply()中获得很多。最后,这些都将调用R函数,所以需要解释然后运行。

Firstly, for() and apply(), sapply() will generally be just as quick as each other if executed correctly. lapply() does more of it's operating in compiled code within the R internals than the others, so can be faster than those functions. It appears the speed advantage is greatest when the act of "looping" over the data is a significant part of the compute time; in many general day-to-day uses you are unlikely to gain much from the inherently quicker lapply(). In the end, these all will be calling R functions so they need to be interpreted and then run.

for()循环通常更容易实现,特别是如果你来自编程背景,循环是普遍的。在循环中工作可能比将迭代计算强加到 apply 家族函数之一更自然。但是,要正确使用 for()循环,您需要执行一些额外的工作来设置存储并管理将环路的输出再次插入。 apply 函数为您自动执行此操作。例如:

for() loops can often be easier to implement, especially if you come from a programming background where loops are prevalent. Working in a loop may be more natural than forcing the iterative computation into one of the apply family functions. However, to use for() loops properly, you need to do some extra work to set-up storage and manage plugging the output of the loop back together again. The apply functions do this for you automagically. E.g.:

IN <- runif(10)
OUT <- logical(length = length(IN))
for(i in IN) {
    OUT[i] <- IN > 0.5
}

这是愚蠢 c $ c>> 是一个向量化运算符,但我想要一些点,即必须管理输出。主要的是,使用 for()循环,你总是分配足够的存储来保存输出,然后开始循环。如果你不知道你需要多少存储,那么分配一个合理的存储块,然后在循环检查你是否已经用尽了那个存储,并且在另一个大块的存储上。

that is a silly example as > is a vectorised operator but I wanted something to make a point, namely that you have to manage the output. The main thing is that with for() loops, you always allocate sufficient storage to hold the outputs before you start the loop. If you don't know how much storage you will need, then allocate a reasonable chunk of storage, and then in the loop check if you have exhausted that storage, and bolt on another big chunk of storage.

在我看来,使用 apply 系列函数的主要原因是为了更优雅,可读的代码。而不是管理输出存储和设置循环(如上所示),我们可以让R处理,并简洁地要求R在我们的数据的子集上运行一个函数。速度通常不会进入决定,至少对我来说。我使用适合情况最好的函数,并将导致简单,易于理解的代码,因为我更可能浪费更多的时间比我保存通过总是选择最快的函数,如果我不记得什么代码

The main reason, in my mind, for using one of the apply family of functions is for more elegant, readable code. Rather than managing the output storage and setting up the loop (as shown above) we can let R handle that and succinctly ask R to run a function on subsets of our data. Speed usually does not enter into the decision, for me at least. I use the function that suits the situation best and will result in simple, easy to understand code, because I'm far more likely to waste more time than I save by always choosing the fastest function if I can't remember what the code is doing a day or a week or more later!

apply 系列适用于标量或向量运算。 A for()循环通常会使用相同的索引 i 进行多次迭代操作。例如,我写了使用 for()循环来执行 k -fold或bootstrap对象的交叉验证的代码。我可能永远不会这样做与 apply 家庭,因为每个CV迭代需要多个操作,访问大量的对象在当前框架,并填充几个输出对象

The apply family lend themselves to scalar or vector operations. A for() loop will often lend itself to doing multiple iterated operations using the same index i. For example, I have written code that uses for() loops to do k-fold or bootstrap cross-validation on objects. I probably would never entertain doing that with one of the apply family as each CV iteration needs multiple operations, access to lots of objects in the current frame, and fills in several output objects that hold the output of the iterations.

至于最后一点,关于为什么 lapply()可能是更快, for() apply(),你需要意识到循环 R代码或在编译代码中。是的,两者仍然会调用需要解释的R函数,但如果你正在循环并直接从编译的C代码调用(例如 lapply()),那么是在 apply()可以来自于 for() R代码。查看 apply()的源代码,看看它是一个 for()循环的包装,在 lapply()的代码,即:

As to the last point, about why lapply() can possibly be faster that for() or apply(), you need to realise that the "loop" can be performed in interpreted R code or in compiled code. Yes, both will still be calling R functions that need to be interpreted, but if you are doing the looping and calling directly from compiled C code (e.g. lapply()) then that is where the performance gain can come from over apply() say which boils down to a for() loop in actual R code. See the source for apply() to see that it is a wrapper around a for() loop, and then look at the code for lapply(), which is:

> lapply
function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.vector(X) || is.object(X)) 
        X <- as.list(X)
    .Internal(lapply(X, FUN))
}
<environment: namespace:base>

你应该明白为什么 lapply ) for()和其他 apply .Internal()是R调用R本身使用的编译C代码的方法之一。除了操作和对 FUN 的完整性检查,整个计算在C中完成,调用R函数 FUN 。与 apply()的来源比较。

and you should see why there can be a difference in speed between lapply() and for() and the other apply family functions. The .Internal() is one of R's ways of calling compiled C code used by R itself. Apart from a manipulation, and a sanity check on FUN, the entire computation is done in C, calling the R function FUN. Compare that with the source for apply().

这篇关于“应用”的优点是什么,功能?何时使用比“for”更好。循环,它们什么时候不?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆