R代码随着迭代次数的增加而变慢 [英] R code slowing with increased iterations

查看:72
本文介绍了R代码随着迭代次数的增加而变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试提高某些代码的速度.我已经删除了所有循环,正在使用矢量,并且几乎所有内容都经过了流水线处理.我已经为代码的每次迭代计时,随着迭代次数的增加,它似乎正在变慢.

I've been trying to increase the speed of some code. I've removed all loops, am using vectors and have streamed lined just about everything. I've timed each iteration of my code and it appears to be slowing as iterations increase.

### The beginning iterations
   user  system elapsed 
   0.03    0.00    0.03 
   user  system elapsed 
   0.03    0.00    0.04 
   user  system elapsed 
   0.03    0.00    0.03 
   user  system elapsed 
   0.04    0.00    0.05 

### The ending iterations
   user  system elapsed 
   3.06    0.08    3.14 
   user  system elapsed 
   3.10    0.05    3.15 
   user  system elapsed 
   3.08    0.06    3.15 
   user  system elapsed 
   3.30    0.06    3.37 

我有598次迭代,现在大约需要10分钟.我想加快速度.这是我的代码的外观.您需要 RColorBrewer fields 软件包.这是我的数据.是的,我知道它很大,请确保您下载了zip文件.

I have 598 iterations and right now it takes about 10 minutes. I'd like to speed things up. Here's how my code looks. You'll need the RColorBrewer and fields packages. Here's my data. Yes I know its big, make sure you download the zip file.

    StreamFlux <- function(data,NoR,NTS){
###Read in data to display points###
       WLX = c(8,19,29,20,13,20,21)
       WLY = c(25,28,25,21,17,14,12)
       WLY = 34 - WLY
       WLX = WLX / 44
       WLY = WLY / 33
       timedata = NULL
       mf <- function(i){
       b = (NoR+8) * (i-1) + 8

          ###I read in data one section at a time to avoid headers
          mydata = read.table(data,skip=b,nrows=NoR, header=FALSE)
          rows = 34-mydata[,2]
          cols = 45-mydata[,3]
          flows = mydata[,7]
          rows = as.numeric(rows)
          cols = as.numeric(cols)
          rm(mydata)

          ###Create Flux matrix
          flow_mat <- matrix(0,44,33)

          ###Populate matrix###
          flow_mat[(rows - 1) * 44 + (45-cols)] <- flows+flow_mat[(rows - 1) * 44 + (45-cols)]
          flow_mat[flow_mat == 0] <- NA
          rm(flows)
          rm(rows)
          rm(cols)
          timestep = i

          ###Specifying jpeg info###
          jpeg(paste("Steamflow", timestep, ".jpg",sep = ''),
               width = 640, height=441,quality=75,bg="grey")
          image.plot(flow_mat, zlim=c(-1,1), 
                     col=brewer.pal(11, "RdBu"),yaxt="n",
                     xaxt="n", main=paste("Stress Period ", 
                     timestep, sep = ""))
          points(WLX,WLY)
          dev.off()
          rm(flow_mat)
   }
   ST<- function(x){functiontime=system.time(mf(x))
   print(functiontime)}
   lapply(1:NTS, ST)
}

这是运行该功能的方法

###To run all timesteps###
StreamFlux("stream_out.txt",687,598)
###To run the first 100 timesteps###
StreamFlux("stream_out.txt",687,100)
###The first 200 timesteps###
StreamFlux("stream_out.txt",687,200)

要测试删除 print(functiontime)以便在每个时间步停止打印

To test remove print(functiontime) to stop it printing at every timestep then

> system.time(StreamFlux("stream_out.txt",687,100))
  user  system elapsed 
  28.22    1.06   32.67 
> system.time(StreamFlux("stream_out.txt",687,200))
   user  system elapsed 
 102.61    2.98  106.20 

我一直在寻找可以加快运行此代码的速度,并可能解释其速度下降的原因?我是否应该分批运行它,这似乎是一个愚蠢的解决方案.我已经从 plyr 阅读了有关 dlply 的信息.它似乎在此处起作用,但是对我来说有帮助吗?并行处理怎么样,我想我可以弄清楚,但是在这种情况下值得这样做吗?

What I'm looking for is anyway to speed up running this code and possibly an explanation of why it is slowing down? Should I just run it in parts, seems a stupid solution. I've read about dlply from the plyr. It seems to have worked here but would that help in my case? How about parallel processing, I think I can figure that out but is it worth the trouble in this case?

推荐答案

我将遵循@PaulHiemstra的建议,并发表我的评论作为答案.谁可以抗拒Internet点?;)

I will follow @PaulHiemstra's suggestion and post my comment as an answer. Who can resist Internet points? ;)

快速浏览一下代码,我同意@joran在他的评论中的第二点:由于重复读取数据,您的循环/功能可能会变慢.更具体地说,这部分代码可能需要修复:

From a quick glance at your code, I agree with @joran's second point in his comment: your loop/function is probably slowing down due to repeatedly reading in your data. More specifically, this part of the code probably needs to be fixed:

read.table(data,skip = b,nrows = NoR,header = FALSE).

尤其是,我认为 skip = b 参数是罪魁祸首.如果可能的话,您应该在开始时读取所有数据,然后从内存中检索必要的部分以进行计算.

In particular, I think the skip=b argument is the culprit. You should read in all the data at the beginning, if possible, and then retrieve the necessary parts from memory for the calculations.

这篇关于R代码随着迭代次数的增加而变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆