R 的应用家族不仅仅是语法糖吗? [英] Is R's apply family more than syntactic sugar?

查看:23
本文介绍了R 的应用家族不仅仅是语法糖吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

...关于执行时间和/或内存.

...regarding execution time and / or memory.

如果这不是真的,请用代码片段证明.请注意,矢量化的加速不计算在内.加速必须来自 apply (tapply, sapply, ...) 本身.

If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must come from apply (tapply, sapply, ...) itself.

推荐答案

R 中的 apply 函数没有提供比其他循环函数(例如 for)更高的性能.一个例外是 lapply,它可以更快一点,因为它在 C 代码中比在 R 中做的工作更多(参见 这个问题的一个例子).

The apply functions in R don't provide improved performance over other looping functions (e.g. for). One exception to this is lapply which can be a little faster because it does more work in C code than in R (see this question for an example of this).

但一般来说,规则是您应该使用应用函数来保持清晰,而不是为了性能.

But in general, the rule is that you should use an apply function for clarity, not for performance.

我要补充一点,应用函数具有没有副作用,这是使用 R 进行函数式编程时的一个重要区别.可以使用 assign<<-,但这可能非常危险.副作用还使程序更难理解,因为变量的状态取决于历史.

I would add to this that apply functions have no side effects, which is an important distinction when it comes to functional programming with R. This can be overridden by using assign or <<-, but that can be very dangerous. Side effects also make a program harder to understand since a variable's state depends on the history.

只是通过一个递归计算斐波那契数列的简单例子来强调这一点;这可以运行多次以获得准确的测量值,但关键是没有一种方法具有显着不同的性能:

Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:

> fibo <- function(n) {
+   if ( n < 2 ) n
+   else fibo(n-1) + fibo(n-2)
+ }
> system.time(for(i in 0:26) fibo(i))
   user  system elapsed 
   7.48    0.00    7.52 
> system.time(sapply(0:26, fibo))
   user  system elapsed 
   7.50    0.00    7.54 
> system.time(lapply(0:26, fibo))
   user  system elapsed 
   7.48    0.04    7.54 
> library(plyr)
> system.time(ldply(0:26, fibo))
   user  system elapsed 
   7.52    0.00    7.58 

编辑 2:

关于 R 并行包的使用(例如 rpvm、rmpi、snow),这些通常提供 apply 系列函数(即使 foreach 包本质上是等效的,尽管名字).下面是 snowsapply 函数的一个简单示例:

Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide apply family functions (even the foreach package is essentially equivalent, despite the name). Here's a simple example of the sapply function in snow:

library(snow)
cl <- makeSOCKcluster(c("localhost","localhost"))
parSapply(cl, 1:20, get("+"), 3)

本例使用socket集群,不需要安装额外的软件;否则,您将需要 PVM 或 MPI 之类的东西(请参阅 Tierney 的聚类页面).snow 具有以下应用功能:

This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see Tierney's clustering page). snow has the following apply functions:

parLapply(cl, x, fun, ...)
parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
parApply(cl, X, MARGIN, FUN, ...)
parRapply(cl, x, fun, ...)
parCapply(cl, x, fun, ...)

apply 函数应该用于并行执行是有道理的,因为它们没有副作用.当您在 for 循环中更改变量值时,它是全局设置的.另一方面,所有 apply 函数都可以安全地并行使用,因为更改是函数调用本地的(除非您尝试使用 assign<<-,在这种情况下,您可以引入副作用).毋庸置疑,小心处理局部变量与全局变量至关重要,尤其是在处理并行执行时.

It makes sense that apply functions should be used for parallel execution since they have no side effects. When you change a variable value within a for loop, it is globally set. On the other hand, all apply functions can safely be used in parallel because changes are local to the function call (unless you try to use assign or <<-, in which case you can introduce side effects). Needless to say, it's critical to be careful about local vs. global variables, especially when dealing with parallel execution.

这里有一个简单的例子来说明 for*apply 之间的区别,就副作用而言:

Here's a trivial example to demonstrate the difference between for and *apply so far as side effects are concerned:

> df <- 1:10
> # *apply example
> lapply(2:3, function(i) df <- df * i)
> df
 [1]  1  2  3  4  5  6  7  8  9 10
> # for loop example
> for(i in 2:3) df <- df * i
> df
 [1]  6 12 18 24 30 36 42 48 54 60

注意父环境中的 df 如何被 for 而不是 *apply 改变.

Note how the df in the parent environment is altered by for but not *apply.

这篇关于R 的应用家族不仅仅是语法糖吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆