R中的并行foreach共享内存 [英] Shared memory in parallel foreach in R

查看:30
本文介绍了R中的并行foreach共享内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题描述:

我有一个大矩阵 c,加载在 RAM 内存中.我的目标是通过并行处理对它进行只读访问.但是,当我使用 doSNOWdoMPIbig.matrix 等创建连接时,使用的 ram 数量急剧增加.

有没有办法正确创建共享内存,所有进程都可以从其中读取数据,而无需创建所有数据的本地副本?

示例:

libs<-function(libraries){# 安装缺失的库,然后加载它们for(库中的lib){if( !is.element(lib, .packages(all.available = TRUE)) ) {install.packages(lib)}图书馆(lib,character.only = TRUE)}}天秤座<-list("foreach","parallel","doSNOW","bigmemory")库(天秤座)#创建一个大约 1GB 大小的矩阵c<-矩阵(runif(10000^2),10000,10000)#将其转换为大矩阵x<-as.big.matrix(c)# 获取矩阵的描述mdesc <- 描述(x)# 创建所需的连接cl <- makeCluster(detectCores())注册DoSNOW(cl)out<-foreach(linID = 1:10, .combine=c) %dopar% {#加载大内存需要(大内存)# 通过共享内存附加矩阵??m <- attach.big.matrix(mdesc)#dummy 表达式来测试数据采集c<-m[1,1]}关闭所有连接()

内存: 期间的内存使用在上图中,您可能会发现内存增加了很多,直到 foreach 结束并被释放.

解决方案

我认为问题的解决方案可以从 foreach 包的作者 Steve Weston 的帖子中看到,

## 2) 引用out <- foreach(linID = 1:4, .combine=c) %dopar% {invisible(c) ## c 被引用并因此导出给工人t <- attach.big.matrix("example.desc")for (i in seq_len(30L)) {for (j in seq_len(m)) {y <- t[i,j]}}返回(0L)}关闭所有连接()

Problem Description:

I have a big matrix c, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW, doMPI, big.matrix, etc the amount to ram used increases dramatically.

Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data?

Example:

libs<-function(libraries){# Installs missing libraries and then load them
  for (lib in libraries){
    if( !is.element(lib, .packages(all.available = TRUE)) ) {
      install.packages(lib)
    }
    library(lib,character.only = TRUE)
  }
}

libra<-list("foreach","parallel","doSNOW","bigmemory")
libs(libra)

#create a matrix of size 1GB aproximatelly
c<-matrix(runif(10000^2),10000,10000)
#convert it to bigmatrix
x<-as.big.matrix(c)
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections    
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
out<-foreach(linID = 1:10, .combine=c) %dopar% {
  #load bigmemory
  require(bigmemory)
  # attach the matrix via shared memory??
  m <- attach.big.matrix(mdesc)
  #dummy expression to test data aquisition
  c<-m[1,1]
}
closeAllConnections()

RAM: in the image above, you may find that the memory increases a lot until foreach ends and it is freed.

解决方案

I think the solution to the problem can be seen from the post of Steve Weston, the author of the foreach package, here. There he states:

The doParallel package will auto-export variables to the workers that are referenced in the foreach loop.

So I think the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens.

Here is an example with a file-backed big.matrix:

#create a matrix of size 1GB aproximatelly
n <- 10000
m <- 10000
c <- matrix(runif(n*m),n,m)
#convert it to bigmatrix
x <- as.big.matrix(x = c, type = "double", 
                 separated = FALSE, 
                 backingfile = "example.bin", 
                 descriptorfile = "example.desc")
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections    
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
## 1) No referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
  t <- attach.big.matrix("example.desc")
  for (i in seq_len(30L)) {
    for (j in seq_len(m)) {
      y <- t[i,j]
    }
  }
  return(0L)
}

## 2) Referencing
out <- foreach(linID = 1:4, .combine=c) %dopar% {
  invisible(c) ## c is referenced and thus exported to workers
  t <- attach.big.matrix("example.desc")
  for (i in seq_len(30L)) {
    for (j in seq_len(m)) {
      y <- t[i,j]
    }
  }
  return(0L)
}
closeAllConnections()

这篇关于R中的并行foreach共享内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆