参考类上的并行计算 [英] parallel computations on Reference Classes

查看:90
本文介绍了参考类上的并行计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的对象列表,我想并行应用一个复杂的函数,但是我当前的方法使用了太多的内存.我以为引用类可能会有所帮助,但是使用mcapply对其进行修改似乎无效.

I have a list of fairly large objects that I want to apply a complicated function to in parallel, but my current method uses too much memory. I thought Reference Classes might help, but using mcapply to modify them doesn't seem to work.

该函数修改对象本身,因此我用新对象覆盖了原始对象.由于该对象是一个列表,而我只修改了其中的一小部分,因此我希望R的修改时复制"语义可以避免创建多个副本.但是,在运行它时,我正在做的事情似乎并非如此.这是我一直在使用的基本R方法的一个小示例.正确将余额重置为零.

The function modifies the object itself, so I overwrite the original object with the new one. Since the object is a list and I'm only modifying a small part of it, I was hoping that R's copy-on-modify semantics would avoid having multiple copies made; however, in running it, it doesn't seem to be the case for what I'm doing. Here's a small example of the base R methods I have been using. It correctly resets the balance to zero.

## make a list of accounts, each with a balance
## and a function to reset the balance
foo <- lapply(1:5, function(x) list(balance=x))
reset1 <- function(x) {x$balance <- 0; x}
foo[[4]]$balance
## 4 ## BEFORE reset
foo <- mclapply(foo, reset1)
foo[[4]]$balance
## 0 ## AFTER reset

似乎使用引用类可能会有所帮助,因为它们是可变的,并且当使用lapply时,它确实如我所愿.余额重置为零.

It seems that using Reference Classes might help as they are mutable, and when using lapply it does do as I expect; the balance is reset to zero.

Account <- setRefClass("Account", fields=list(balance="numeric"),
                       methods=list(reset=function() {balance <<- 0}))

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(lapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 0

但是当我使用mclapply时,它无法正确重置.请注意,如果您使用的是Windows或mc.cores=1,则会调用lapply.

But when I use mclapply, it doesn't properly reset. Note that if you're on Windows or have mc.cores=1, lapply will be called instead.

foo <- lapply(1:5, function(x) Account$new(balance=x))
foo[[4]]$balance
## 4
invisible(mclapply(foo, function(x) x$reset()))
foo[[4]]$balance
## 4

这是怎么回事?如何并行使用参考类?是否有更好的方法可以避免不必要的对象复制?

What's going on? How can I work with Reference Classes in parallel? Is there a better way altogether to avoid unnecessary copying of objects?

推荐答案

我认为分叉的进程虽然可以访问工作空间中的所有变量,但一定不能更改它们.这行得通,但是我还不知道它是否可以改善内存问题.

I think the forked processes, while they have access to all the variables in the workspace, must not be able to change them. This works, but I don't know yet if it improves the memory issues or not.

foo <- mclapply(foo, function(x) {x$reset(); x})
foo[[4]]$balance
## 0

这篇关于参考类上的并行计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆