Julia pmap速度-并行处理-动态编程 [英] Julia pmap speed - parallel processing - dynamic programming

查看:144
本文介绍了Julia pmap速度-并行处理-动态编程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Julia(v0.6.0)中的动态编程问题加快填充矩阵的速度,而使用pmap似乎无法获得太多额外的速度.这与我差不多一年前发布的问题有关:填充矩阵在Julia中使用并行处理.那时我能够在一些很大的帮助下加快串行处理的速度,现在我正试图从Julia中的并行处理工具中获得更高的速度.

I am trying to speed up filling in a matrix for a dynamic programming problem in Julia (v0.6.0), and I can't seem to get much extra speed from using pmap. This is related to this question I posted almost a year ago: Filling a matrix using parallel processing in Julia. I was able to speed up serial processing with some great help then, and I'm now trying to get extra speed from parallel processing tools in Julia.

对于串行处理情况,我使用3维矩阵(本质上是一组相等大小的矩阵,由第1维索引)并在第1维上进行迭代.不过,我想尝试pmap以便更有效地迭代矩阵集.

For the serial processing case, I was using a 3-dimensional matrix (essentially a set of equally-sized matrices, indexed by the 1st-dimension) and iterating over the 1st-dimension. I wanted to give pmap a try, though, to more efficiently iterate over the set of matrices.

这是代码设置.要将pmap与下面的v_iter函数一起使用,我将三维矩阵转换为字典对象,其中字典键等于第一维的索引值(下面代码中的v_dict,带有gcc等于一维尺寸). v_iter函数将其他字典对象(以下为E_opt_dictgridpoint_m_dict)作为附加输入:

Here is the code setup. To use pmap with the v_iter function below, I converted the three dimensional matrix into a dictionary object, with the dictionary keys equal to the index values in the 1st dimension (v_dict in the code below, with gcc equal to the 1st-dimension size). The v_iter function takes other dictionary objects (E_opt_dict and gridpoint_m_dict below) as additional inputs:

function v_iter(a,b,c)
   diff_v = 1
   while diff_v>convcrit
     diff_v = -Inf

     #These lines efficiently multiply the value function by the Markov transition matrix, using the A_mul_B function
     exp_v       = zeros(Float64,gkpc,1)
     A_mul_B!(exp_v,a[1:gkpc,:],Zprob[1,:])
     for j=2:gz
       temp=Array{Float64}(gkpc,1)
       A_mul_B!(temp,a[(j-1)*gkpc+1:(j-1)*gkpc+gkpc,:],Zprob[j,:])
       exp_v=hcat(exp_v,temp)
     end    

     #This tries to find the optimal value of v
     for h=1:gm
       for j=1:gz
         oldv = a[h,j]
         newv = (1-tau)*b[h,j]+beta*exp_v[c[h,j],j]
         a[h,j] = newv
         diff_v = max(diff_v, oldv-newv, newv-oldv)
       end
     end
   end
end

gz =  9  
gp =  13  
gk =  17  
gcc =  5  
gm    = gk * gp * gcc * gz
gkpc  = gk * gp * gcc
gkp = gk*gp
beta  = ((1+0.015)^(-1))
tau        = 0.35
Zprob = [0.43 0.38 0.15 0.03 0.00 0.00 0.00 0.00 0.00; 0.05 0.47 0.35 0.11 0.02 0.00 0.00 0.00 0.00; 0.01 0.10 0.50 0.30 0.08 0.01 0.00 0.00 0.00; 0.00 0.02 0.15 0.51 0.26 0.06 0.01  0.00 0.00; 0.00 0.00 0.03 0.21 0.52 0.21 0.03 0.00 0.00 ; 0.00 0.00  0.01  0.06 0.26 0.51 0.15 0.02 0.00 ; 0.00 0.00 0.00 0.01 0.08 0.30 0.50 0.10 0.01 ; 0.00 0.00 0.00 0.00 0.02 0.11 0.35 0.47 0.05; 0.00 0.00 0.00 0.00 0.00 0.03 0.15 0.38 0.43]
convcrit = 0.001   # chosen convergence criterion

E_opt                  = Array{Float64}(gcc,gm,gz)    
fill!(E_opt,10.0)

gridpoint_m   = Array{Int64}(gcc,gm,gz)
fill!(gridpoint_m,fld(gkp,2)) 

v_dict=Dict(i => zeros(Float64,gm,gz) for i=1:gcc)
E_opt_dict=Dict(i => E_opt[i,:,:] for i=1:gcc)
gridpoint_m_dict=Dict(i => gridpoint_m[i,:,:] for i=1:gcc) 

对于并行处理,我执行了以下两个命令:

For parallel processing, I executed the following two commands:

wp = CachingPool(workers())
addprocs(3)
pmap(wp,v_iter,values(v_dict),values(E_opt_dict),values(gridpoint_m_dict))

...产生了这种表现:

...which produced this performance:

135.626417 seconds (3.29 G allocations: 57.152 GiB, 3.74% gc time)

然后我尝试进行串行处理:

I then tried to serial process instead:

for i=1:gcc
    v_iter(v_dict[i],E_opt_dict[i],gridpoint_m_dict[i])
end

...并获得更好的性能.

...and received better performance.

128.263852 seconds (3.29 G allocations: 57.101 GiB, 4.53% gc time)

这也使我获得与在原始3维对象上运行v_iter相同的性能:

This also gives me about the same performance as running v_iter on the original 3-dimensional objects:

v=zeros(Float64,gcc,gm,gz)
for i=1:gcc
    v_iter(v[i,:,:],E_opt[i,:,:],gridpoint_m[i,:,:])
end

我知道并行处理会涉及设置时间,但是当我增加gcc的值时,串行和并行处理的处理时间仍然相等.这似乎是并行处理的理想选择,因为不需要在工作程序之间进行消息传递!但是我似乎无法使其高效地工作.

I know that parallel processing involves setup time, but when I increase the value of gcc, I still get about equal processing time for serial and parallel. This seems like a good candidate for parallel processing, since there is no need for messaging between the workers! But I can't seem to make it work efficiently.

推荐答案

在添加工作进程之前,先创建CachingPool.因此,传递给pmap的缓存池告诉它仅使用一个工作程序. 您可以简单地通过运行wp.workers进行检查,您会看到类似Set([1])的内容. 因此应该是: addprocs(3) wp = CachingPool(workers()) 您也可以考虑运行Julia -p命令行参数,例如julia -p 3,然后您可以跳过addprocs(3)命令.

You create the CachingPool before adding the worker processes. Hence your caching pool passed to pmap tells it to use just a single worker. You can simply check it by running wp.workers you will see something like Set([1]). Hence it should be: addprocs(3) wp = CachingPool(workers()) You could also consider running Julia -p command line parameter e.g. julia -p 3 and then you can skip the addprocs(3) command.

最重要的是,您的forpmap循环是不等效的. Julia Dict对象是一个哈希表,类似于其他语言,它不提供元素顺序之类的东西.因此,在您的for循环中,可以确保获得相同的匹配i -th元素,而使用values时,值的顺序不需要与原始顺序匹配(并且对于这三个变量,您可以具有不同的顺序)在pmap循环中). 由于Dicts的键只是从1gcc的数字,因此您应该简单地使用数组.您可以使用与Python非常相似的生成器.举个例子代替 v_dict=Dict(i => zeros(Float64,gm,gz) for i=1:gcc) 使用 v_dict_a = [zeros(Float64,gm,gz) for i=1:gcc]

On top of that your for and pmap loops are not equivalent. The Julia Dict object is a hashmap and similar to other languages does not offer anything like element order. Hence in your for loop you are guaranteed to get the same matching i-th element while with the values the ordering of values does not need to match the original ordering (and you can have different order for each of those three variables in the pmap loop). Since the keys for your Dicts are just numbers from 1 up to gcc you should simply use arrays instead. You can use generators very similar to Python. For an example instead of v_dict=Dict(i => zeros(Float64,gm,gz) for i=1:gcc) use v_dict_a = [zeros(Float64,gm,gz) for i=1:gcc]

希望有帮助.

这篇关于Julia pmap速度-并行处理-动态编程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆