并行计算:在每个线程中仅加载一次包 [英] Parallel computation: Loading packages in each thread only once

查看:115
本文介绍了并行计算:在每个线程中仅加载一次包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在处理一些大型数据集,因此并行化工作流是唯一的方法.

I am currently working with some large datasets, so parallelizing the workflows is the only way to go.

我需要在开始时将一些程序包加载到每个线程一次 (即:for(this.thread in threads) { #load some packages }.

I need to load some packages to each thread once at the beginning (i.e: for(this.thread in threads) { #load some packages }.

不幸的是,我不确定该怎么做.

Unfortunately , I'm not sure how to do that.

以下代码进一步说明了我的问题,我试图在%dopar%中使用magrittr中的管道运算符:

The following code further illustrates my problem, where I am trying to use the pipe operator from magrittr in a %dopar% :

.

library(parallel)
library(doParallel)
library(foreach)
library(magrittr)


# Generate some random data and function :
# -----------------------------------------

randomData = runif(10^3)
randomFunction = function(x) {x * (2^x) } 

randomData[1] %>% randomFunction #Works



# And now ... The parallel part :
# --------------------------------

myCluster = makeCluster(6)
registerDoParallel(myCluster)


# Test that the do par is up and running: 
foreach(i = randomData) %dopar% { i }


# Use magrittr pipe operator: 
# Error in { : task 1 failed - "could not find function "%>%""
foreach(i = randomData) %dopar% { i %>% randomFunction }


# Load the library at each loop: (ie: length(data) times !)
# Other than unnecessarily loading the library (length(data) - numberOfThreads) times, 
# it works nicely
foreach(i = randomData) %dopar% { library(magrittr);  i %>% randomFunction }


# Now try without re-loading: 
# Tararaa - (ie: Works nicely)
foreach(i = randomData) %dopar% { i %>% randomFunction }

.

有什么想法吗?

推荐答案

doParallel包从parallel继承了一些方便的底层函数,其中包括clusterCall,该函数在每个节点上执行一次.

The doParallel package inherits some handy low level functions from parallel including clusterCall which executes the function once on each node.

我遇到了完全相同的问题,并通过执行以下操作解决了该问题:

I had the exact same problem and solved it by doing:

library(doParallel)
myCluster = makeCluster(6)
registerDoParallel(myCluster)
clusterCall(myCluster, function() library(magrittr))

您还可以使用参数.packages:

foreach(i = 1:5, .packages = "magrittr") %dopar% {i %>% runif}

这篇关于并行计算:在每个线程中仅加载一次包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆