带plyr的多核,MC [英] multicore with plyr, MC

查看:78
本文介绍了带plyr的多核,MC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将R的plyr库中的ddplyMC包一起使用.它似乎并没有加快计算速度.这是我运行的代码:

Hi I am trying to use ddply in the plyr library in R, with the MC package. It doesn't seem to be speeding up the computation. This is the code I run:

require(doMC)
registerDoMC(4)
getDoParWorkers()
##> 4
test <- data.frame(x=1:10000, y=rep(c(1:20), 500))
system.time(ddply(test, "y", mean))
  # user  system elapsed 
  # 0.015   0.000   0.015
system.time(ddply(test, "y", mean, .parallel=TRUE))
  # user  system elapsed 
  # 223.062   2.825   1.093 

有什么想法吗?

推荐答案

相对于将拆分部分分配到每个核心并获取结果所需的通信成本,mean函数的运行速度过快.

The mean function operates too quickly relative to the communication costs required to distribute the split sections to each core and retrieve the results.

这是人们在分布式计算中遇到的一个常见问题".他们希望这样做会使一切运行得更快,因为他们忘记了成本(节点之间的通信)和收益(使用多个内核).

This is a common "problem" people run into with distributed computing. They expect it to make everything run faster because they forget there are costs (communication between the nodes) as well as benefits (using multiple cores).

plyr中特定于并行处理的东西:只有该函数在多个内核上运行.拆分和合并仍然仍在单个内核上完成,因此要并行使用plyr函数,您要应用的函数必须在计算上非常密集才能看到好处.

Something specific to parallel processing in plyr: only the function is run on multiple cores. The splitting and combining still is still done on a single core, so the function you're applying would have to be very computationally intensive to see a benefit when using plyr functions in parallel.

这篇关于带plyr的多核,MC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆