用lapply替换for循环中的rbind?(地狱第二圈) [英] Replace rbind in for-loop with lapply? (2nd circle of hell)

查看:28
本文介绍了用lapply替换for循环中的rbind?(地狱第二圈)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在优化一段 R 代码时遇到问题.以下示例代码应说明我的优化问题:

I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:

一些初始化和函数定义:

Some initialisations and a function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c("a","b","c","d","z","g","h","r")
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

有问题的 for 循环如下所示:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

问题似乎出在 rbind 函数上.我经常收到关于 solution 大小的错误消息,在一段时间(超过 50 MB).我想用列表和 lapply 和/或 foreach 替换这个循环和 rbind.我已经开始将 myframe 转换为列表.

The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB). I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

尽管我尝试应用此 非常好的并行处理介绍.

如何在不必更改 myfunction 的情况下重建 for 循环?显然我对不同的解决方案持开放态度...

How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...

这个问题似乎直接来自第二圈地狱R 地狱.有什么建议吗?

This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?

推荐答案

在这样的循环中使用 rbind 的原因是不好的做法,是在每次迭代中你都会扩大你的 解决方案 数据帧,然后将其复制到一个新的对象,这是一个非常缓慢的过程,也会导致内存问题.解决此问题的一种方法是创建一个列表,其第 i 个组件将存储第 i 个循环迭代的输出.最后一步是在该列表上调用 rbind(最后只调用一次).这看起来像

The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like

my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))

这篇关于用lapply替换for循环中的rbind?(地狱第二圈)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆