在R中循环效率低下 [英] Loops inefficiency in R

查看：338 发布时间：2018/4/18 15:27:14 r functional-programming loops

本文介绍了在R中循环效率低下的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

早上好，

我在R中开发了几个月，并且必须确保我的代码的执行时间不会太长，因为我分析大数据集。

因此，我一直在尝试使用尽可能多的矢量化函数。然而，

然而，，我仍然在想什么。

R中的代价是什么，不是循环本身的权利？
我的意思是，当你开始在循环中修改变量时会出现问题，例如是否正确？

因此我在想，如果你只需要在每个元素上运行一个函数（你实际上不关心结果）。例如在数据库中写入数据。你应该做什么？

1）使用mapply而不在任何地方存储结果？

2）做一个循环在向量中，只应用f（i）到每个元素？

3）有没有更好的函数可能会错过？

（这当然假设你的函数没有最佳地向量化）。

那么 foreach 包？你有没有经历过使用它的任何性能改进？ 解决方案

只需几条评论。循环的大致与 apply 及其变体一样快，并且当您将矢量化函数尽可能多（即使用低级循环，而不是 apply ，它只隐藏 for 循环）。我不确定这是否是最好的例子，但请考虑以下几点：

 > n<  -  1e06 
> sinI-rep（NA，n）
> system.time（for（i in 1：n）sinI [i] < -  sin（i））
 user system elapsed 
 3.316 0.000 3.358 
> system.time（sinI < -  sapply（1：n，sin））
用户系统流逝的
 5.217 0.016 5.311 
> system.time（sinI < -  unlist（lapply（1：n，sin），
 + recursive = FALSE，use.names = FALSE））
用户系统已用
 1.284 0.012 1.303 
> system.time（sinI < -  sin（1：n））
用户系统流逝的
 0.056 0.000 0.057

在下面的评论之一中，Marek指出上面的循环的的耗时部分实际上是 ]< - 部分：

 > system.time（sinI < -  unlist（lapply（1：n，sin），
 + recursive = FALSE，use.names = FALSE））
用户系统已用
 1.284 0.012 1.303

无法立即被矢量化的瓶颈可以在C或Fortran中重写，编译时使用 R CMD SHLIB ，然后插入 .Call ， .C 或 .Fortran 。

另请参阅这些链接，以获取有关R中循环优化的更多信息。另请参阅文章我怎样才能避免这个循环或让它更快？在R News。 Good morning, I have been developing for a few months in R and I have to make sure that the execution time of my code is not too long because I analyze big datasets. Hence, I have been trying to use as much vectorized functions as possible. However, I am still wondering something. What is costly in R is not the loop itself right? I mean, the problem arises when you start modifying variables within the loop for example is that correct? Hence I was thinking, what if you simply have to run a function on each element (you actually do not care about the result). For example to write data in a database. What should you do? 1) use mapply without storing the result anywhere? 2) do a loop over the vector and only apply f(i) to each element? 3) is there a better function I might have missed? (that's of course assuming your function is not optimally vectorized). What about the foreach package? Have you experienced any performance improvement by using it? 解决方案 Just a couple of comments. A for loop is roughly as fast as apply and its variants, and the real speed-ups come when you vectorise your function as much as possible (that is, using low-level loops, rather than apply, which just hides the for loop). I'm not sure if this is the best example, but consider the following: > n <- 1e06 > sinI <- rep(NA,n) > system.time(for(i in 1:n) sinI[i] <- sin(i)) user system elapsed 3.316 0.000 3.358 > system.time(sinI <- sapply(1:n,sin)) user system elapsed 5.217 0.016 5.311 > system.time(sinI <- unlist(lapply(1:n,sin), + recursive = FALSE, use.names = FALSE)) user system elapsed 1.284 0.012 1.303 > system.time(sinI <- sin(1:n)) user system elapsed 0.056 0.000 0.057 In one of the comments below, Marek points out that the time consuming part of the for loop above is actually the ]<- part: > system.time(sinI <- unlist(lapply(1:n,sin), + recursive = FALSE, use.names = FALSE)) user system elapsed 1.284 0.012 1.303 The bottlenecks which can't immediately be vectorised can be rewritten in C or Fortran, compiled with R CMD SHLIB, and then plugged in with .Call, .C or .Fortran. Also, see these links for more info about loop optimisation in R. Also check out the article "How Can I Avoid This Loop or Make It Faster?" in R News. 这篇关于在R中循环效率低下的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中循环效率低下 [英] Loops inefficiency in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中循环效率低下 [英] Loops inefficiency in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭