用lapply替换for循环中的rbind? (地狱第二圈) [英] Replace rbind in for-loop with lapply? (2nd circle of hell)

查看:465
本文介绍了用lapply替换for循环中的rbind? (地狱第二圈)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法优化一段R代码。下面的示例代码应该说明我的优化问题:

一些初始化和一个函数定义:

 (10,20,30,40,50,60,70,80)
b <-c(a,b,c, d,z,g,h,r)
c <-c(1,2,3,4,5,6,7,8)
myframe< ; - data.frame(A,b,C)$ b $的b值< - 向量(长度=列)
溶液< - 矩阵(nrow = nrow(myframe),NcoI位=列+ 3)

myfunction< - 函数(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else {
athing = 1000
}
value [colums + 1] = athing
return(value)}

有问题的for循环如下所示:

 values<  -  myfunction(as.matrix(myframe [i,]),columns)$ b $(
$) $ b values [columns + 2] = i
values [columns + 3] = myframe [i,3]
#more columns added w简单运算(即总和)

解决方案< - rbind(解决方案,值)
#solution是一个来自for循环之外的大矩阵
}

问题似乎是 rbind 函数。 我经常收到关于解决方案大小的错误消息,这个消息在一段时间后似乎很大(大于50 MB)。
我想用一个列表和 lapply 和/或foreach替换这个循环和 rbind 。我已经开始将 myframe 转换为列表。

  myframe_list< -  lapply(seq_len(nrow(myframe)),函数(ⅰ)myframe [I,])

我还没有超过这个,尽管我尝试了这个应用/ $ rel =nofollow>非常好的并行处理介绍。

如何重建for循环而不必更改 myfunction的?很明显,我打开不同的解决方案... ...编辑:这个问题似乎是直接从来自R地狱的第二圈地狱。任何建议?


解决方案

的原因是,在象一个循环中使用 rbind 这是不好的做法,是在每次迭代中放大你的解决方案数据框,然后将它复制到一个新的对象,这是一个非常缓慢的过程,也可能导致内存问题。一个办法是创建一个列表,其第i个组件将存储第i个循环迭代的输出。最后一步是在该列表上调用rbind(最后一次)。这看起来像

  my.list<  -  vector(list,nrow(myframe))
对于(我在1:nrow(myframe)){
#调用所有必要的命令来创建值
my.list [[i]]< - 值
}
解决方案< - rbind(solution,do.call(rbind,my.list))


I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:

Some initialisations and a function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c("a","b","c","d","z","g","h","r")
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

The problematic for-loop looks like this:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB). I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

I have not really come further than this, although I tried applying this very good introduction to parallel processing.

How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...

Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?

解决方案

The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like

my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))

这篇关于用lapply替换for循环中的rbind? (地狱第二圈)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆