R doParallel中的并行处理foreach保存数据 [英] Parallel proccessing in R doParallel foreach save data

查看:159
本文介绍了R doParallel中的并行处理foreach保存数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使并行处理部分正常工作方面已经取得了进展,但是将向量与获取距离一起保存不能正常工作.我得到的错误是

Progress has been made on getting the parallel processing part working but saving the vector with the fetch distances is not working properly. The error I get is

df_Test_Fetch <- data.frame(x_lake_length)
Error in data.frame(x_lake_length) : object 'x_lake_length' not found
write.table(df_Test_Fetch,file="C:/tempTest_Fetch.csv",row.names=TRUE,col.names=TRUE, sep=",")
Error in is.data.frame(x) : object 'df_Test_Fetch' not found

我尝试更改下面的代码,以便将foreach步骤输出到x_lake_length.但这并没有输出我所希望的向量.我如何才能将实际结果保存到csv文件中.我正在使用R x64 3.3.0运行Windows 8计算机.

I have tried altering the code below so that the foreach step is output to x_lake_length. But that did not output the vector as I hoped. How can I get the actually results to be saved to a csv file. I am running a windows 8 computer with R x64 3.3.0.

预先感谢您 仁

这是完整的代码.

 # make sure there is no prexisting data
rm(x_lake_length)

# Libraries ---------------------------------------------------------------
if (!require("pacman")) install.packages("pacman")
pacman::p_load(lakemorpho,rgdal,maptools,sp,doParallel,foreach,
               doParallel)

# HPC ---------------------------------------------------------------------
cores_2_use <- detectCores() - 2
cl          <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)

# Data --------------------------------------------------------------------

ogrDrivers()

dsn <- system.file("vectors", package = "rgdal")[1]
# the line below is commented out but when I run the script on my data the line below is what I use instead of the one above
# then making the name changes as needed
# dsn<-setwd("J:\\Elodea\\ByHUC6\\")
ogrListLayers(dsn)
ogrInfo(dsn=dsn, layer="trin_inca_pl03")
owd <- getwd()
setwd(dsn)
ogrInfo(dsn="trin_inca_pl03.shp", layer="trin_inca_pl03")
setwd(owd)
x <- readOGR(dsn=dsn, layer="trin_inca_pl03")
summary(x)

# Analysis ----------------------------------------------------------------  
myfun <- function(x,i){tmp<-lakeMorphoClass(x[i,],NULL,NULL,NULL)
x_lake_length<-vector("numeric",length = nrow(x))
x_lake_length[i]<-lakeMaxLength(tmp,200)
print(i)
Sys.sleep(0.1)}

foreach(i = 1:nrow(x),.combine=cbind,.packages=c("lakemorpho","rgdal"))  %dopar% (
  myfun(x,i)
)
options(digits=10)
df_Test_Fetch <- data.frame(x_lake_length)
write.table(df_Test_Fetch,file="C:/temp/Test_Fetch.csv",row.names=TRUE,col.names=TRUE, sep=",")
print(proc.time())

推荐答案

我认为这就是您想要的,尽管如果不了解主题,我就不能100%确定.

I think this is what you want, though without understanding the subject matter I can't be 100% sure.

我要做的是在您的并行化函数中添加一个return(),并在调用foreach时将返回的对象的值分配给x_lake_length.但是我只是猜测那是您要尝试的操作,所以如果我写错了,请纠正我.

What I did was add a return() to your parallelized function and assigned the value of that returned object to x_lake_length when you call the foreach. But I'm only guessing that that's what you were trying to do, so please correct me if I'm wrong.

# make sure there is no prexisting data
rm(x_lake_length)

# Libraries ---------------------------------------------------------------
if (!require("pacman")) install.packages("pacman")
pacman::p_load(lakemorpho,rgdal,maptools,sp,doParallel,foreach,
               doParallel)

# HPC ---------------------------------------------------------------------
cores_2_use <- detectCores() - 2
cl          <- makeCluster(cores_2_use, useXDR = F)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl, cores_2_use)

# Data --------------------------------------------------------------------

ogrDrivers()

dsn <- system.file("vectors", package = "rgdal")[1]
# the line below is commented out but when I run the script on my data the line below is what I use instead of the one above
# then making the name changes as needed
# dsn<-setwd("J:\\Elodea\\ByHUC6\\")
ogrListLayers(dsn)
ogrInfo(dsn=dsn, layer="trin_inca_pl03")
owd <- getwd()
setwd(dsn)
ogrInfo(dsn="trin_inca_pl03.shp", layer="trin_inca_pl03")
setwd(owd)
x <- readOGR(dsn=dsn, layer="trin_inca_pl03")
summary(x)

# Analysis ----------------------------------------------------------------  
myfun <- function(x,i){tmp<-lakeMorphoClass(x[i,],NULL,NULL,NULL)
                      x_lake_length<-vector("numeric",length = nrow(x))
                      x_lake_length[i]<-lakeMaxLength(tmp,200)
                      print(i)
                      Sys.sleep(0.1)
                      return(x_lake_length)
}

x_lake_length <- foreach(i = 1:nrow(x),.combine=cbind,.packages=c("lakemorpho","rgdal"))  %dopar% (
  myfun(x,i)
)

options(digits=10)
df_Test_Fetch <- data.frame(x_lake_length)
write.table(df_Test_Fetch,file="C:/temp/Test_Fetch.csv",row.names=TRUE,col.names=TRUE, sep=",")
print(proc.time())

这篇关于R doParallel中的并行处理foreach保存数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆