在R中并行化矢量化函数的最简单方法是什么? [英] What is the easiest way to parallelize a vectorized function in R?
问题描述
我有一个很大的列表X
和一个矢量化函数f
.我想计算f(X)
,但是如果我使用单个内核,这将花费很长时间.我拥有(访问)48核服务器.并行化f(X)
的计算的最简单方法是什么?以下是不是正确的答案:
I have a very large list X
and a vectorized function f
. I want to calculate f(X)
, but this will take a long time if I do it with a single core. I have (access to) a 48-core server. What is the easiest way to parallelize the calculation of f(X)
? The following is not the right answer:
library(foreach)
library(doMC)
registerDoMC()
foreach(x=X, .combine=c) %dopar% f(x)
上面的代码确实可以并行化f(X)
的计算,但是可以通过将f
分别应用于X
的每个元素来实现.这忽略了f
的矢量化性质,因此可能会使事情变慢,而不是变快.我不想将f
逐个元素地应用于X
,而是希望将X
拆分为合理大小的块并将f
应用于这些块.
The above code will indeed parallelize the calculation of f(X)
, but it will do so by applying f
separately to every element of X
. This ignores the vectorized nature of f
and will probably make things slower as a result, not faster. Rather than applying f
elementwise to X
, I want to split X
into reasonably-sized chunks and apply f
to those.
因此,我是否应该手动将X
分成48个大小相等的子列表,然后然后将f
并行应用于每个子列表,然后手动将结果汇总在一起?还是为此目的设计了一个包装?
So, should I just manually split X
into 48 equal-sized sublists and then apply f
to each in parallel, then manually put together the result? Or is there a package designed for this?
万一有人怀疑,我的具体用例是此处.
In case anyone is wondering, my specific use case is here.
推荐答案
itertools软件包旨在解决此类问题.在这种情况下,我将使用isplitVector
:
The itertools package was designed to address this kind of problem. In this case, I would use isplitVector
:
n <- getDoParWorkers()
foreach(x=isplitVector(X, chunks=n), .combine='c') %dopar% f(x)
对于此示例,pvec
无疑是更快,更简单的方法,但是例如,它可以在Windows上使用doParallel软件包使用.
For this example, pvec
is undoubtably faster and simpler, but this can be used on Windows with the doParallel package, for example.
这篇关于在R中并行化矢量化函数的最简单方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!