在R中并行化矢量化函数的最简单方法是什么? [英] What is the easiest way to parallelize a vectorized function in R?

查看:170
本文介绍了在R中并行化矢量化函数的最简单方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的列表X和一个矢量化函数f.我想计算f(X),但是如果我使用单个内核,这将花费很长时间.我拥有(访问)48核服务器.并行化f(X)的计算的最简单方法是什么?以下是不是正确的答案:

I have a very large list X and a vectorized function f. I want to calculate f(X), but this will take a long time if I do it with a single core. I have (access to) a 48-core server. What is the easiest way to parallelize the calculation of f(X)? The following is not the right answer:

library(foreach)
library(doMC)
registerDoMC()

foreach(x=X, .combine=c) %dopar% f(x)

上面的代码确实可以并行化f(X)的计算,但是可以通过将f分别应用于X的每个元素来实现.这忽略了f的矢量化性质,因此可能会使事情变慢,而不是变快.我不想将f逐个元素地应用于X,而是希望将X拆分为合理大小的块并将f应用于这些块.

The above code will indeed parallelize the calculation of f(X), but it will do so by applying f separately to every element of X. This ignores the vectorized nature of f and will probably make things slower as a result, not faster. Rather than applying f elementwise to X, I want to split X into reasonably-sized chunks and apply f to those.

因此,我是否应该手动将X分成48个大小相等的子列表,然后然后f并行应用于每个子列表,然后手动将结果汇总在一起?还是为此目的设计了一个包装?

So, should I just manually split X into 48 equal-sized sublists and then apply f to each in parallel, then manually put together the result? Or is there a package designed for this?

万一有人怀疑,我的具体用例是此处.

In case anyone is wondering, my specific use case is here.

推荐答案

itertools软件包旨在解决此类问题.在这种情况下,我将使用isplitVector:

The itertools package was designed to address this kind of problem. In this case, I would use isplitVector:

n <- getDoParWorkers()
foreach(x=isplitVector(X, chunks=n), .combine='c') %dopar% f(x)

对于此示例,pvec无疑是更快,更简单的方法,但是例如,它可以在Windows上使用doParallel软件包使用.

For this example, pvec is undoubtably faster and simpler, but this can be used on Windows with the doParallel package, for example.

这篇关于在R中并行化矢量化函数的最简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆