foreach()垃圾回收 [英] foreach() garbage collection

查看:197
本文介绍了foreach()垃圾回收的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用doSMP包中的嵌套foreach生成基于我开发的函数的结果。通常问题会使用三个嵌套循环,但由于生成结果的大小(每个i约为80,000),因此当最终结果矩阵超出指定行数时,我不得不暂停编译并将结果写入文件。

  i = 1 
write.off = 1

while(i <= length(i.vector)){
results.frame = as.data.frame(matrix(NA,ncol = 3,nrow = 1))

while(nrow(results.frame )< 500000& i< = length(i.vector)){
results = foreach(j = 1:length(j.vector),.combine =rbind,.inorder = TRUE) :%
foreach(k = 1:长度(k.vector),.combine =rbind,.inorder = TRUE)%dopar%{

ith.value = i.vector [i]
jth.value = j.vector [j]
kth.value = k.vector [k]
my.function(ith.value,jth.value,kth.value )
}

results.frame = rbind(results.frame,results)
i = i + 1
}

results.frame = results.frame [-1,]
write .table(results.frame,paste(part_,write.off,sep =))
write.off = write.off + 1
}

我遇到的问题是垃圾回收。工作人员似乎没有将记忆重新分配给系统,所以在i = 4时,他们每个人都已经吃掉了大约6GB的内存。

我试着直接将foreach循环插入到foreach循环中,并且试图将函数及其结果赋值给命名的环境,我可以定期清除。这些方法都没有奏效。



我觉得像foreach的initEnvir和finalEnvir参数可能会提供一个解决方案,但文档和示例并没有真正阐明这一点。



我在运行Windows Server 2008的虚拟机上运行此代码。



考虑在<$ c $中使用 gen.factorial 函数来避免这个问题。 c> AlgDesign ,a la:

  fact1 = gen.factorial(c(length(i。向量),长度(j.vector),长度(k.vector)),nVars = 3,center = FALSE)
foreach(ix_row = 1:nrow(fact1))%dopar%{
my .function(fact1 [ix_row,])
}

您也可以使用内存映射文件并使用 bigmemory 来预分配输出存储空间(假设您创建了一个矩阵),并且这将使每个工作人员可以自行存储其输出。 p>

你的整体内存使用量应该会大幅下降。




更新1:内存问题似乎是 doSMP 。看看下面的帖子:





我记得看到 doSMP 存在另一个内存问题,无论是作为问题还是在R聊天中,但似乎无法恢复该帖子。 / p>

更新2:我不知道这是否有帮助,但可以尝试使用明确的 return() (例如 return(my.function(ith.value,jth.value,kth.value)))。在我的代码中,为了清晰起见,我通常使用明确的 return()


I'm using nested foreach from the doSMP package to generate results based on a function I developed. Ordinarily the problem would use three nested loops, but due to the size of results generated (around 80,000 for each i), I've had to pause compilation and write the results to file when the final results matrix exceeds a specified number of rows.

i = 1
write.off = 1

while(i <= length(i.vector)){
        results.frame = as.data.frame(matrix(NA, ncol = 3, nrow = 1))

        while(nrow(results.frame) < 500000 & i <= length(i.vector)){
                results = foreach(j = 1:length(j.vector), .combine = "rbind", .inorder = TRUE) %:%
                foreach(k = 1:length(k.vector), .combine = "rbind", .inorder = TRUE) %dopar%{

                        ith.value = i.vector[i]
                        jth.value = j.vector[j]
                        kth.value = k.vector[k]
                        my.function(ith.value, jth.value, kth.value)
                }

                results.frame = rbind(results.frame, results)
                i = i + 1
        }

        results.frame = results.frame[-1,]
        write.table(results.frame, paste("part_",write.off, sep = ""))
        write.off = write.off + 1   
}

The problem I'm having is with garbage collection. The workers don't seem to reallocate memory back to the system, so by i = 4 they each have eaten up around 6GB of memory.

I've tried inserting gc() into the foreach loop directly as well as into the underlying function, and I've also tried assigning the function and its results to a named environment that I can clear periodically. None of these methods have worked.

I feel like foreach's initEnvir and finalEnvir parameters might offer a solution, but the documentation and examples haven't really shed much light on this.

I'm running this code on a VM operating Windows Server 2008.

解决方案

You might consider avoiding this issue altogether by writing a different loop.

Consider using the gen.factorial function in AlgDesign, a la:

fact1 = gen.factorial(c(length(i.vector), length(j.vector), length(k.vector)), nVars = 3, center = FALSE)
foreach(ix_row = 1:nrow(fact1)) %dopar% {
  my.function(fact1[ix_row,])
}

You could also use memory mapped files and pre-allocate the output storage using bigmemory (assuming you're creating a matrix) and that would make it feasible for each worker to store its output on its own.

In this way, your overall memory usage should drop dramatically.


Update 1: It seems that memory issues are endemic to doSMP. Check out the following posts:

I recall seeing another memory issue for doSMP, either on as a question or in the R chat, but I can't seem to recover the post.

Update 2: I don't know if this will help, but you might try using an explicit return() (e.g. return(my.function(ith.value, jth.value, kth.value))). In my code, I generally use an explicit return() for clarity.

这篇关于foreach()垃圾回收的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆