在R中循环效率低下 [英] Loops inefficiency in R

查看:338
本文介绍了在R中循环效率低下的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

早上好,

我在R中开发了几个月,并且必须确保我的代码的执行时间不会太长,因为我分析大数据集。



因此,我一直在尝试使用尽可能多的矢量化函数。然而,

然而, ,我仍然在想什么。



R中的代价是什么,不是循环本身的权利?
我的意思是,当你开始在循环中修改变量时会出现问题,例如是否正确?

因此我在想,如果你只需要在每个元素上运行一个函数(你实际上不关心结果)。例如在数据库中写入数据。你应该做什么?

1)使用mapply而不在任何地方存储结果?

2)做一个循环在向量中,只应用f(i)到每个元素?



3)有没有更好的函数可能会错过?



(这当然假设你的函数没有最佳地向量化)。



那么 foreach 包?你有没有经历过使用它的任何性能改进? 解决方案

只需几条评论。 循环的大致与 apply 及其变体一样快,并且当您将矢量化函数尽可能多(即使用低级循环,而不是 apply ,它只隐藏 for 循环)。我不确定这是否是最好的例子,但请考虑以下几点:

 > n<  -  1e06 
> sinI-rep(NA,n)
> system.time(for(i in 1:n)sinI [i] < - sin(i))
user system elapsed
3.316 0.000 3.358
> system.time(sinI < - sapply(1:n,sin))
用户系统流逝的
5.217 0.016 5.311
> system.time(sinI < - unlist(lapply(1:n,sin),
+ recursive = FALSE,use.names = FALSE))
用户系统已用
1.284 0.012 1.303
> system.time(sinI < - sin(1:n))
用户系统流逝的
0.056 0.000 0.057

在下面的评论之一中,Marek指出上面的循环的的耗时部分实际上是 ]< - 部分:

 > system.time(sinI < -  unlist(lapply(1:n,sin),
+ recursive = FALSE,use.names = FALSE))
用户系统已用
1.284 0.012 1.303

无法立即被矢量化的瓶颈可以在C或Fortran中重写,编译时使用 R CMD SHLIB ,然后插入 .Call .C .Fortran



另请参阅这些 链接,以获取有关R中循环优化的更多信息。另请参阅文章我怎样才能避免这个循环或让它更快?在R News。


Good morning,

I have been developing for a few months in R and I have to make sure that the execution time of my code is not too long because I analyze big datasets.

Hence, I have been trying to use as much vectorized functions as possible.

However, I am still wondering something.

What is costly in R is not the loop itself right? I mean, the problem arises when you start modifying variables within the loop for example is that correct?

Hence I was thinking, what if you simply have to run a function on each element (you actually do not care about the result). For example to write data in a database. What should you do?

1) use mapply without storing the result anywhere?

2) do a loop over the vector and only apply f(i) to each element?

3) is there a better function I might have missed?

(that's of course assuming your function is not optimally vectorized).

What about the foreach package? Have you experienced any performance improvement by using it?

解决方案

Just a couple of comments. A for loop is roughly as fast as apply and its variants, and the real speed-ups come when you vectorise your function as much as possible (that is, using low-level loops, rather than apply, which just hides the for loop). I'm not sure if this is the best example, but consider the following:

> n <- 1e06
> sinI <- rep(NA,n)
> system.time(for(i in 1:n) sinI[i] <- sin(i))
   user  system elapsed 
  3.316   0.000   3.358 
> system.time(sinI <- sapply(1:n,sin))
   user  system elapsed 
  5.217   0.016   5.311 
> system.time(sinI <- unlist(lapply(1:n,sin),
+       recursive = FALSE, use.names = FALSE))
   user  system elapsed 
  1.284   0.012   1.303 
> system.time(sinI <- sin(1:n))
   user  system elapsed 
  0.056   0.000   0.057 

In one of the comments below, Marek points out that the time consuming part of the for loop above is actually the ]<- part:

> system.time(sinI <- unlist(lapply(1:n,sin),
+       recursive = FALSE, use.names = FALSE))
   user  system elapsed 
  1.284   0.012   1.303 

The bottlenecks which can't immediately be vectorised can be rewritten in C or Fortran, compiled with R CMD SHLIB, and then plugged in with .Call, .C or .Fortran.

Also, see these links for more info about loop optimisation in R. Also check out the article "How Can I Avoid This Loop or Make It Faster?" in R News.

这篇关于在R中循环效率低下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆