在Mac OS X 10.6.7版上当.parallel = TRUE时ddply变慢 [英] Slower ddply when .parallel=TRUE on Mac OS X Version 10.6.7

查看:72
本文介绍了在Mac OS X 10.6.7版上当.parallel = TRUE时ddply变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试让ddply在我的Mac上并行运行.我使用的代码如下:

I am trying to get ddply to run in parallel on my mac. The code I've used is as follows:

library(doMC)
library(ggplot2) # for the purposes of getting the baseball data.frame
registerDoMC(2)


> system.time(ddply(baseball, .(year), numcolwise(mean)))
   user  system elapsed 
  0.959   0.106   1.522 
> system.time(ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE))
   user  system elapsed 
  2.221   2.790   2.552 

为什么当我运行.parallel = TRUE时ddply变慢?我在网上搜索无济于事.我也尝试过registerDoMC(),结果是相同的.

Why is ddply slower when I run .parallel=TRUE? I have searched online to no avail. I've also tried registerDoMC() and the results were the same.

推荐答案

baseball数据可能太小,无法通过使计算并行进行改进.通过并行执行计算,将数据传递给不同进程的开销可能会淹没任何加速.使用rbenchmark包:

The baseball data may be too small to see improvement by making the computations parallel; the overhead of passing the data to the different processes may be swamping any speedup by doing the calculations in parallel. Using the rbenchmark package:

baseball10 <- baseball[rep(seq(length=nrow(baseball)), 10),]

benchmark(noparallel = ddply(baseball, .(year), numcolwise(mean)),
    parallel = ddply(baseball, .(year), numcolwise(mean), .parallel=TRUE),
    noparallel10 = ddply(baseball10, .(year), numcolwise(mean)),
    parallel10 = ddply(baseball10, .(year), numcolwise(mean), .parallel=TRUE),
    replications = 10)

给出结果

          test replications elapsed relative user.self sys.self user.child sys.child
1   noparallel           10   4.562 1.000000     4.145    0.408      0.000     0.000
3 noparallel10           10  14.134 3.098203     9.815    4.242      0.000     0.000
2     parallel           10  11.927 2.614423     2.394    1.107      4.836     6.891
4   parallel10           10  18.406 4.034634     4.045    2.580     10.210     9.769

使用10倍大的数据集,并行处理的代价较小.更复杂的计算也会使它在并行的支持下进一步倾斜,可能会给它带来好处.

With a 10 times bigger data set, the penalty for parallel is smaller. A more complicated computation would also tilt it even further in parallel's favor, likely giving it an advantage.

这是在Mac OS X 10.5.8 Core 2 Duo计算机上运行的.

This was run on a Mac OS X 10.5.8 Core 2 Duo machine.

这篇关于在Mac OS X 10.6.7版上当.parallel = TRUE时ddply变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆