为什么并行包比只使用apply 慢? [英] Why is the parallel package slower than just using apply?

查看:24
本文介绍了为什么并行包比只使用apply 慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试确定何时使用 parallel 包来加快运行某些分析所需的时间.我需要做的一件事是创建矩阵,比较具有不同行数的两个数据框中的变量.我问了一个关于在
(来源:bryer.org)

要求(并行)需要(ggplot2)需要(重塑2)set.seed(2112)结果 <- 列表()尺寸 <- seq(1000, 30000, by=5000)pb <- txtProgressBar(min=0, max=length(sizes), style=3)for(cnt in 1:length(sizes)) {i <- 尺寸[cnt]df1 <- data.frame(row.names=1:i,var1=sample(c(TRUE,FALSE), i, replace=TRUE),var2=sample(1:10, i, replace=TRUE) )df2 <- data.frame(row.names=(i + 1):(i + i),var1=sample(c(TRUE,FALSE), i, replace=TRUE),var2=sample(1:10, i, replace=TRUE))tm1 <- system.time({df6 <- sapply(df2$var1, FUN=function(x) { x == df1$var1 })dimnames(df6) <- list(row.names(df1), row.names(df2))})rm(df6)tm2 <- system.time({cl <- makeCluster(getOption('cl.cores', detectCores()))tm3 <- system.time({df7 <- parSapply(cl, df1$var1, FUN=function(x, df2) { x == df2$var1 }, df2=df2)dimnames(df7) <- 列表(row.names(df1), row.names(df2))})停止集群(cl)})rm(df7)结果[[cnt]] <- c(apply=tm1,parallel.total=tm2,parallel.exec=tm3)setTxtProgressBar(pb, cnt)}toplot <- as.data.frame(results)[,c('apply.user.self','parallel.total.user.self','parallel.exec.user.self')]toplot$size <- 尺寸toplot <-melt(toplot, id='size')ggplot(toplot, aes(x=size, y=value, colour=variable)) + geom_line() +xlab('矢量大小') + ylab('时间(秒)')

解决方案

并行运行作业会产生开销.只有当您在工作节点上触发的作业花费大量时间时,并行化才能提高整体性能.当单个作业只需要几毫秒时,不断触发作业的开销会降低整体性能.诀窍是在节点上划分工作,使作业足够长,比如至少几秒钟.我用它同时运行六个 Fortran 模型效果很好,但这些单独的模型运行需要几个小时,几乎抵消了开销的影响.

请注意,我没有运行您的示例,但我上面描述的情况通常是并行化比顺序运行花费的时间更长的问题.

I am trying to determine when to use the parallel package to speed up the time necessary to run some analysis. One of the things I need to do is create matrices comparing variables in two data frames with differing number of rows. I asked a question as to an efficient way of doing on StackOverflow and wrote about tests on my blog. Since I am comfortable with the best approach I wanted to speed up the process by running it in parallel. The results below are based upon a 2ghz i7 Mac with 8gb of RAM. I am surprised that the parallel package, the parSapply funciton in particular, is worse than just using the apply function. The code to replicate this is below. Note that I am currently only using one of the two columns I create but eventually want to use both.


(source: bryer.org)

require(parallel)
require(ggplot2)
require(reshape2)
set.seed(2112)
results <- list()
sizes <- seq(1000, 30000, by=5000)
pb <- txtProgressBar(min=0, max=length(sizes), style=3)
for(cnt in 1:length(sizes)) {
    i <- sizes[cnt]
    df1 <- data.frame(row.names=1:i, 
                      var1=sample(c(TRUE,FALSE), i, replace=TRUE), 
                      var2=sample(1:10, i, replace=TRUE) )
    df2 <- data.frame(row.names=(i + 1):(i + i), 
                      var1=sample(c(TRUE,FALSE), i, replace=TRUE),
                      var2=sample(1:10, i, replace=TRUE))
    tm1 <- system.time({
        df6 <- sapply(df2$var1, FUN=function(x) { x == df1$var1 })
        dimnames(df6) <- list(row.names(df1), row.names(df2))
    })
    rm(df6)
    tm2 <- system.time({
        cl <- makeCluster(getOption('cl.cores', detectCores()))
        tm3 <- system.time({
            df7 <- parSapply(cl, df1$var1, FUN=function(x, df2) { x == df2$var1 }, df2=df2)
            dimnames(df7) <- list(row.names(df1), row.names(df2))
        })
        stopCluster(cl)
    })
    rm(df7)
    results[[cnt]] <- c(apply=tm1, parallel.total=tm2, parallel.exec=tm3)
    setTxtProgressBar(pb, cnt)
}

toplot <- as.data.frame(results)[,c('apply.user.self','parallel.total.user.self',
                          'parallel.exec.user.self')]
toplot$size <- sizes
toplot <- melt(toplot, id='size')

ggplot(toplot, aes(x=size, y=value, colour=variable)) + geom_line() + 
    xlab('Vector Size') + ylab('Time (seconds)')

解决方案

Running jobs in parallel incurs overhead. Only if the jobs you fire at the worker nodes take a significant amount of time does parallelization improve overall performance. When the individual jobs take only milliseconds, the overhead of constantly firing off jobs will deteriorate overall performance. The trick is to divide the work over the nodes in such a way that the jobs are sufficiently long, say at least a few seconds. I used this to great effect running six Fortran models simultaneously, but these individual model runs took hours, almost negating the effect of overhead.

Note that I haven't run your example, but the situation I describe above is often the issue when parallization takes longer than running sequentially.

这篇关于为什么并行包比只使用apply 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆