带plyr的多核,MC [英] multicore with plyr, MC
问题描述
我正在尝试将R的plyr
库中的ddply
与MC
包一起使用.它似乎并没有加快计算速度.这是我运行的代码:
Hi I am trying to use ddply
in the plyr
library in R, with the MC
package. It doesn't seem to be speeding up the computation. This is the code I run:
require(doMC)
registerDoMC(4)
getDoParWorkers()
##> 4
test <- data.frame(x=1:10000, y=rep(c(1:20), 500))
system.time(ddply(test, "y", mean))
# user system elapsed
# 0.015 0.000 0.015
system.time(ddply(test, "y", mean, .parallel=TRUE))
# user system elapsed
# 223.062 2.825 1.093
有什么想法吗?
推荐答案
相对于将拆分部分分配到每个核心并获取结果所需的通信成本,mean
函数的运行速度过快.
The mean
function operates too quickly relative to the communication costs required to distribute the split sections to each core and retrieve the results.
这是人们在分布式计算中遇到的一个常见问题".他们希望这样做会使一切运行得更快,因为他们忘记了成本(节点之间的通信)和收益(使用多个内核).
This is a common "problem" people run into with distributed computing. They expect it to make everything run faster because they forget there are costs (communication between the nodes) as well as benefits (using multiple cores).
plyr中特定于并行处理的东西:只有该函数在多个内核上运行.拆分和合并仍然仍在单个内核上完成,因此要并行使用plyr函数,您要应用的函数必须在计算上非常密集才能看到好处.
Something specific to parallel processing in plyr: only the function is run on multiple cores. The splitting and combining still is still done on a single core, so the function you're applying would have to be very computationally intensive to see a benefit when using plyr functions in parallel.
这篇关于带plyr的多核,MC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!