在Mac OS X 10.6.7版上当.parallel = TRUE时ddply变慢 [英] Slower ddply when .parallel=TRUE on Mac OS X Version 10.6.7
问题描述
我正在尝试让ddply在我的Mac上并行运行.我使用的代码如下:
I am trying to get ddply to run in parallel on my mac. The code I've used is as follows:
library(doMC)
library(ggplot2) # for the purposes of getting the baseball data.frame
registerDoMC(2)
> system.time(ddply(baseball, .(year), numcolwise(mean)))
user system elapsed
0.959 0.106 1.522
> system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE))
user system elapsed
2.221 2.790 2.552
为什么当我运行.parallel = TRUE时ddply变慢?我在网上搜索无济于事.我也尝试过registerDoMC()
,结果是相同的.
Why is ddply slower when I run .parallel=TRUE? I have searched online to no avail. I've also tried registerDoMC()
and the results were the same.
推荐答案
baseball
数据可能太小,无法通过使计算并行进行改进.通过并行执行计算,将数据传递给不同进程的开销可能会淹没任何加速.使用rbenchmark
包:
The baseball
data may be too small to see improvement by making the computations parallel; the overhead of passing the data to the different processes may be swamping any speedup by doing the calculations in parallel. Using the rbenchmark
package:
baseball10 <- baseball[rep(seq(length=nrow(baseball)), 10),]
benchmark(noparallel = ddply(baseball, .(year), numcolwise(mean)),
parallel = ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE),
noparallel10 = ddply(baseball10, .(year), numcolwise(mean)),
parallel10 = ddply(baseball10, .(year), numcolwise(mean), .parallel=TRUE),
replications = 10)
给出结果
test replications elapsed relative user.self sys.self user.child sys.child
1 noparallel 10 4.562 1.000000 4.145 0.408 0.000 0.000
3 noparallel10 10 14.134 3.098203 9.815 4.242 0.000 0.000
2 parallel 10 11.927 2.614423 2.394 1.107 4.836 6.891
4 parallel10 10 18.406 4.034634 4.045 2.580 10.210 9.769
使用10倍大的数据集,并行处理的代价较小.更复杂的计算也会使它在并行的支持下进一步倾斜,可能会给它带来好处.
With a 10 times bigger data set, the penalty for parallel is smaller. A more complicated computation would also tilt it even further in parallel's favor, likely giving it an advantage.
这是在Mac OS X 10.5.8 Core 2 Duo计算机上运行的.
This was run on a Mac OS X 10.5.8 Core 2 Duo machine.
这篇关于在Mac OS X 10.6.7版上当.parallel = TRUE时ddply变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!