在R中并行共享内存 [英] Shared memory in parallel foreach in R
问题描述
我有一个大矩阵 有没有一种方法来正确创建一个共享内存,其中所有的进程可能读取从而无需创建所有数据的本地副本 示例: 内存: 我认为问题的解决方案可以从 Problem Description: I have a big matrix Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data? Example: RAM:
in the image above, you may find that the memory increases a lot until I think the solution to the problem can be seen from the post of Steve Weston, the author of the The doParallel package will auto-export variables to the workers that are referenced in the foreach loop. So I think the problem is that in your code your big matrix Here is an example with a file-backed 这篇关于在R中并行共享内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! c
,加载到RAM内存中。我的目标是通过并行处理来只读访问它。但是当我创建连接时,我使用 doSNOW
, doMPI
, big.matrix
等,ram使用的金额急剧增加。
libs< -function(libraries){#安装缺少的库,然后加载它们
(lib在库中){
if(!is.element(lib,.packages( all.available = TRUE))){
install.packages(lib)
}
library(lib,character.only = TRUE)
}
}
$ b libra< -list(foreach,parallel,doSNOW,bigmemory)
libs(libra)
#创建一个大小为1GB的矩阵aproximatelly
c< -matrix(runif(10000 ^ 2),10000,10000)
#将其转换为bigmatrix
x< -as.big.matrix(c)
#得到矩阵的描述
mdesc< - describe(x)
#创建t他需要连接
cl< - makeCluster(detectCores())
registerDoSNOW(cl)
out< -foreach(linID = 1:10,.combine = c)%dopar%{
#load bigmemory
require(bigmemory)
#通过共享内存附加矩阵?
m< - attach.big.matrix(mdesc)
#dummy表达式来测试数据获取
c< -m [1,1]
}
closeAllConnections )
在上面的图片中,您可能会发现内存增加了很多,直到 foreach
结束并被释放。 foreach $的作者Steve Weston的帖子中看到c $ c> package,
c
, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW
, doMPI
, big.matrix
, etc the amount to ram used increases dramatically.libs<-function(libraries){# Installs missing libraries and then load them
for (lib in libraries){
if( !is.element(lib, .packages(all.available = TRUE)) ) {
install.packages(lib)
}
library(lib,character.only = TRUE)
}
}
libra<-list("foreach","parallel","doSNOW","bigmemory")
libs(libra)
#create a matrix of size 1GB aproximatelly
c<-matrix(runif(10000^2),10000,10000)
#convert it to bigmatrix
x<-as.big.matrix(c)
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
out<-foreach(linID = 1:10, .combine=c) %dopar% {
#load bigmemory
require(bigmemory)
# attach the matrix via shared memory??
m <- attach.big.matrix(mdesc)
#dummy expression to test data aquisition
c<-m[1,1]
}
closeAllConnections()
foreach
ends and it is freed.foreach
package, here. There he states:
c
is referenced in the assignment c<-m[1,1]
. Just try xyz <- m[1,1]
instead and see what happens.big.matrix
:#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double",
separated = FALSE,
backingfile = "example.bin",
descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
invisible(c) ## c is referenced and thus exported to workers
t <- attach.big.matrix("example.desc")
for (i in seq_len(30L)) {
for (j in seq_len(m)) {
y <- t[i,j]
}
}
return(0L)
}
closeAllConnections()