加载文件并行不工作foreach + data.table [英] loading files in parallel not working with foreach + data.table
问题描述
我想使用 foreach
结合 data.table
(v.1.8.7)加载文件并绑定它们。 foreach
不并行化,并返回警告...
.table(matrix(rnorm(5e6),nrow = 5e5),myFile.csv,quote = F,sep =,,row.names = F,col.names = T)
library 。表);
#I为性能和可用性使用data.table 1.8.7(dev)fread
DT = fread(myFile.csv)
现在假设我有n个文件要加载和rowbind,我想parralellize它。
(我在Windows上,所以没有forking)
allFiles = rep(myFile.csv #可以将3改为任何
/ p>
f1< - function(allFiles){
DT< - lapply(allFiles,FUN = fread)#will按顺序加载myFile.csv 3次,带有fread
DT< - rbindlist(DT);
return(DT);
}
使用parallel(R的一部分为2.14.0)
library(parallel)
f2< - function(allFiles){
mc< - detectCores #how多核?
cl< - makeCluster(mc); #build the cluster
DT< - parLapply(cl,allFiles,fun = fread); #call fread每个核心(well ...至少使用每个核心)
stopCluster(cl);
DT< - rbindlist(DT);
return(DT);
}
现在我想使用foreach
library(foreach)
f3< - function(allFiles){
DT< - foreach(myFile = allFiles,.combine ='rbind',.inorder = FALSE)%dopar%fread(myFile)
return(DT);
}
这里有一些基准, foreach
工作
system.time(DT< ));
utilisateursystÞmeÚcoulÚ
34.61 0.14 34.84
system.time(DT < - f2(allFiles));
utilisateursystÞmeÚcoulÚ
1.03 0.40 24.30
system.time(DT < - f3(allFiles));
执行%dopar%顺序:没有并行后端注册
utilisateursystÞmeÚcoulÚ
35.05 0.22 35.38
只是为了得到这个回答:
警告消息告诉你,没有并行后端注册 foreach
。请阅读本插页,了解如何操作。 / p>
来自小插曲的简单示例:
$ b cl< - makeCluster(3)
pre>
registerDoParallel(cl)
foreach(i = 1:3)%dopar%sqrt(i)
I would like to use
foreach
in conjuction withdata.table
(v.1.8.7) to load files and bind them.foreach
is not parallelizing, and returning a warning...write.table(matrix(rnorm(5e6),nrow=5e5),"myFile.csv",quote=F,sep=",",row.names=F,col.names=T) library(data.table); #I use fread from data.table 1.8.7 (dev) for performance and useability DT = fread("myFile.csv")
Now suppose I have n of those files to load and rowbind, I would like to parralellize it. (I am on Windows, so no forking)
allFiles = rep("myFile.csv",4) # you can change 3 to whatever
using lapply
f1 <- function(allFiles){ DT <- lapply(allFiles, FUN=fread) #will load sequentially myFile.csv 3 times with fread DT <- rbindlist(DT); return(DT); }
using parallel (part of R as 2.14.0)
library(parallel) f2 <- function(allFiles){ mc <- detectCores(); #how many cores? cl <- makeCluster(mc); #build the cluster DT <- parLapply(cl,allFiles,fun=fread); #call fread on each core (well... using each core at least) stopCluster(cl); DT <- rbindlist(DT); return(DT); }
now I want to use foreach
library(foreach) f3 <- function(allFiles){ DT <- foreach(myFile=allFiles, .combine='rbind', .inorder=FALSE) %dopar% fread(myFile) return(DT); }
Here are some benchmarks confirming I can't kave
foreach
workingsystem.time(DT <- f1(allFiles)); utilisateur systÞme ÚcoulÚ 34.61 0.14 34.84 system.time(DT <- f2(allFiles)); utilisateur systÞme ÚcoulÚ 1.03 0.40 24.30 system.time(DT <- f3(allFiles)); executing %dopar% sequentially: no parallel backend registered utilisateur systÞme ÚcoulÚ 35.05 0.22 35.38
解决方案Just to get this answered:
As the warning message tells you, there is no parallel backend registered for
foreach
. Read this vignette to learn how to do that.Simple example from the vignette:
library(doParallel) cl <- makeCluster(3) registerDoParallel(cl) foreach(i=1:3) %dopar% sqrt(i)
这篇关于加载文件并行不工作foreach + data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!