mclapply vs for循环绘图:速度和可扩展性的焦点 [英] mclapply vs for loops for plotting: speed and scalability focus

查看:171
本文介绍了mclapply vs for循环绘图:速度和可扩展性的焦点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中运行一个函数,可能需要很长时间才能运行,因为它执行了多个命令,将一些数据转换为子集,然后将其推入 ggplot 到情节。我需要多次调整参数值来运行这个函数。我将提供的例子是一个简单的例子,但想知道如何加快速度?如果扩大规模,即什么是获得每一个组合的最快方法...是有一个通用的方法转换为循环到 mclapply code>假设它们更快...请随意提供替代的模拟示例,以显示对特定方法的偏好

模拟示例: / b>

基本功能:

$ $ p $ ff < - >函数(n,mu,stdev){
x1 < - c(1:n)
y1 < - rnorm(n,mu,stdev)
z1 < - data.frame (cbind(x1,y1))
ggplot(z1,aes(x = x1,y = y1))+
geom_point()+
labs(title = paste(n = ,n,mu =,mu,stdev =,stdev))
}



<所以通过参数的nieve方式将执行以下操作:

  for(i in 1:10 ){
for(j in 1:2){
for(k in seq(100,500,by = 100)){
ff(k,i,j)
}


wha吨会是加速这个最快的方式吗?我假设它可能需要像 expand.grid(x = c(1:10),y = c(1:2),z = seq(100,500,by = 100) code>和使用 mclapply 以某种并行的方式运行每行... (我有4个可用的核心)。请随意从基本功能中抽出一些东西,或者根据最快速度提高的方法把东西放到基本功能中。如果你增加每个参数的范围,这个过程显然需要更长的时间,但是没有什么可以做到的...或者如果在更多的内核或其他东西上分裂的话也可以被改变...?



和奖励点...有什么可以保存输出图像和创建滑块,如操作通过所有参数以交互的方式进行......在这个过程中,所有的参数都是拉出相关的图片,而不是每次都重新计算一次。

NB请随意使用/建议您认为可能对您的解决方案有用的任何其他包(例如 foreach )。

mclapply ,将这些参数组合到一个列表中,并将其传递给函数,而不是使用for循环。



例如

  df < -  expand.grid(i = 1:10 (列表,n = df [,3],mu = df [,1],stdev = df [1],j = 1:2,k = seq(100,500,100))
params < ,2],SIMPLIFY = F)

ff < - 函数(tlist){
n < - tlist $ n
mu < - tlist $ mu
stdev <-tlist $ stdev
x1 <-c(1:n)
y1 < - rnorm(n,mu,stdev)
z1 < - data.frame(cbind (x1,y1))
ggplot(z1,aes(x = x1,y = y1))+
geom_point()+
labs(title = paste(n =,n ,mu =,mu,stdev =,stdev))
}

结果< - llply(params,ff,.progress ='text')

如果使用 mclapply < code $
$ pre $ lt; code> results< - mclapply(params,ff,mc.cores = 4,mc.preschedule = TRUE )


I am running a function in R that can take a long time to run as it carries out multiple commands to transform and subset some data before it pushes it into ggplot to plot. I need to run this function multiple times adjusting the arguments values. The example I will provide is a simple one...but was wondering how to speed it up? if scaled up, i.e. what is the fastest way of getting every single combination...is there a generic method of converting for loops into mclapply assuming they are faster...please feel free to provide alternative mock examples that demonstrate a preference for a particular method

mock example:

the basic function:

ff <- function(n, mu, stdev){
     x1 <- c(1:n)
     y1 <- rnorm(n,mu,stdev)
     z1 <- data.frame(cbind(x1,y1))
     ggplot(z1, aes(x=x1,y=y1))+
       geom_point()+
       labs(title=paste("n=",n,"mu=",mu, "stdev=",stdev))
}

so the nieve way of going through parameters would be to do the following...

for(i in 1:10){
    for(j in 1:2){
       for(k in seq(100,500,by=100)){
         ff(k,i,j)
       }
    }
}

what would be the fastest way of speeding this up? I'm assuming it might need something like expand.grid(x=c(1:10),y=c(1:2),z=seq(100,500,by=100)) and the using mclapply to run through each row...in some sort of parallel manner? (I have 4 cores available for this). Please feel free to pull bits out of the basic function or put things into the basic function according to the methods that would create the greatest improvement in speed. The process will obviously take longer if you increase the range for each parameter, but is there nothing that can be done about that...or can that be changed somehow too if split across more cores or something...?

and for bonus points...is there anything that will save the output images and create sliders like in the package manipulate to go through all the parameters in an interactive manner...in which all it is doing is pulling out the relevant image, rather than recalculating it each time.

N.B. Please feel free to use/suggest any other packages (like foreach) that you think might be useful for your solution

解决方案

If using mclapply, combine the parameters into a list and pass that to the function rather than using a for loop.

e.g.

df <- expand.grid(i = 1:10, j = 1:2 , k = seq(100, 500, 100))
params <- mapply(list, n = df[, 3], mu = df[, 1], stdev = df[,2], SIMPLIFY = F)

ff <- function(tlist) {
    n <- tlist$n 
    mu <- tlist$mu 
    stdev <- tlist$stdev
     x1 <- c(1:n)
     y1 <- rnorm(n,mu,stdev)
     z1 <- data.frame(cbind(x1,y1))
     ggplot(z1, aes(x=x1,y=y1))+
       geom_point()+
       labs(title=paste("n=",n,"mu=",mu, "stdev=",stdev))
}

results <- llply(params, ff, .progress='text')

If using mclapply

results <- mclapply(params, ff, mc.cores = 4, mc.preschedule = TRUE)

这篇关于mclapply vs for循环绘图:速度和可扩展性的焦点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆