R:获取给定函数内所有变量和函数的列表和环境(用于并行处理) [英] R: get list and environment of all variables and functions within a given function (for parallel processing)

查看:114
本文介绍了R:获取给定函数内所有变量和函数的列表和环境(用于并行处理)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用foreach进行并行处理,这需要通过列表将功能手动传递到寻址内核的环境.我想使这个过程自动化,并涵盖所有用例.简单的仅使用封闭变量的简单函数.但是,一旦要并行处理的函数使用的是在另一个环境中定义的参数和变量,就会带来复杂性.考虑以下情况:

I am using foreach for parallel processing, which requires manual passing of functions via a list to the environments of addressed cores. I want to automate this process and cover all use cases. Easy for simple functions which use only enclosed variables. Complications however as soon as functions which are to be parallel processed are using arguments and variables that are defined in another environment. Consider the following case:

global.variable <- 3

global.function <-function(j){
  res <- j^2
  return(res)
}

compute.in.parallel <-function(i){
  res <- global.function(i+global.variable)
  return(res)
}

pop <- seq(10)

do <- function(pop,fun){
  require(doParallel)
  require(foreach)
  cl <- makeCluster(16)
  registerDoParallel(cl)
  clusterExport(cl,list("global.variable","global.function"),envir=globalenv())
  results <- foreach(i=pop) %dopar% fun(i)
  stopCluster(cl)
  return(results)
}

do(pop,compute.in.parallel)

之所以可行,是因为我也手动将global.variable和global.function传递给内核(请注意,compute.in.parallel本身会自动在范围内考虑): clusterExport(cl,list("global.variable","global.function"),envir=globalenv())

this works because I manually pass the global.variable and global.function to the cores as well (note that compute.in.parallel itself is automatically considered within the scope): clusterExport(cl,list("global.variable","global.function"),envir=globalenv())

但我想自动执行-要求构建一个字符串,其中包含在compute.in.parallel中使用(但未定义/传递/包含)的所有变量和函数.我该怎么做?

but I want to do it automatically - requiring to build a string of all variables and functions which are used (but not defined/passed/contained) within compute.in.parallel. How do I do this?

我当前的解决方法是将所有可用变量转储到内核:

My current workaround is dump all available variables to the cores:

clusterExport(cl,as.list(unique(c(ls(.GlobalEnv),ls(environment())))),envir=environment())

但是,这并不令人满意-我没有考虑包名称空间和其他隐藏环境中的变量,也没有考虑将太多的变量传递给内核,从而在每次并行运行时都会产生大量开销.

This is however non-satisfactory - I am not considering variables in package namespaces and other hidden environments as well as generally passing way too many variables to the cores, creating significant overhead with every parallel run.

有人建议改进吗?

推荐答案

未来默认情况下,框架会自动识别并导出全局变量. doFuture 包为foreach提供了通用的将来的后端适配器.如果您使用它,则可以进行以下操作:

The future framework automatically identifies and exports globals by default. The doFuture package provides a generic future backend adaptor for foreach. If you use that, the following works:

do <- function(pop, fun) {
  library("doFuture")
  registerDoFuture()
  cl <- parallel::makeCluster(2)
  old_plan <- plan(cluster, workers = cl)
  on.exit({
    plan(old_plan)
    parallel::stopCluster(cl)
  })

  foreach(i = pop) %dopar% fun(i)
}

这篇关于R:获取给定函数内所有变量和函数的列表和环境(用于并行处理)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆