两个openmp有序块,没有并行化结果 [英] Two openmp ordered blocks with no resulting parallelization

查看:235
本文介绍了两个openmp有序块,没有并行化结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Fortran程序,该程序需要具有可复制的结果(用于发布).我对以下代码的理解是,它应该是可重现的.

I am writing a Fortran program that needs to have reproducible results (for publication). My understanding of the following code is that it should be reproducible.

program main
implicit none
real(8) :: ybest,xbest,x,y
integer :: i

ybest = huge(0d0)
!$omp parallel do ordered private(x,y) shared(ybest,xbest) schedule(static,1)
do i = 1,10
    !$omp ordered
    !$omp critical
    call random_number(x)
    !$omp end critical
    !$omp end ordered

    ! Do a lot of work
    call sleep(1)
    y = -1d0

    !$omp ordered
    !$omp critical
    if (y<ybest) then
    ybest = y
    xbest = x
    end if
    !$omp end critical
    !$omp end ordered
end do
!$omp end parallel do

end program

在我的情况下,有一个函数代替睡眠",这需要很长时间才能计算出来,我希望它可以并行完成.根据OpenMP标准,此示例中的睡眠是否应并行执行?我认为应该(基于 omp有序子句如何工作?),但使用gfortran 5.2.0(mac)和gfortran 5.1.0(linux)时,它不是并行执行的(至少没有加速).计时结果如下.

In my case, there is a function in place of "sleep" that takes long time to compute, and I want it done in parallel. According to OpenMP standards, should sleep in this example execute in parallel? I thought it should be (based on this How does the omp ordered clause work?), but with gfortran 5.2.0 (mac) and gfortran 5.1.0 (linux) it is not executing in parallel (at least, there is no speedup from it). The timing results are below.

此外,我的猜测是关键的陈述不是必需的,但我不确定.

Also, my guess is the critical statements are not necessary, but I wasn't completely sure.

谢谢.

-编辑-

为回应弗拉德米尔(Vladmir)的评论,我添加了一个具有计时结果的完整工作程序.

In response to Vladmir's comments, I added a full working program with timing results.

#!/bin/bash
mpif90 main.f90
time ./a.out
mpif90 main.f90 -fopenmp
time ./a.out

代码以

real    0m10.047s
user    0m0.003s
sys 0m0.003s

real    0m10.037s
user    0m0.003s
sys 0m0.004s

但是,如果注释掉有序块,它将运行以下时间:

BUT, if you comment out the ordered blocks, it runs with the following times:

real    0m10.044s
user    0m0.002s
sys 0m0.003s

real    0m3.021s
user    0m0.002s
sys 0m0.004s

  • 编辑-
  • 为响应innoSPG,以下是非平凡功能代替睡眠的结果:

    In response to innoSPG, here are the results for a non-trivial function in place of sleep:

    real(8) function f(x)
        implicit none
        real(8), intent(in) :: x
        ! local
        real(8) :: tmp
        integer :: i
        tmp = 0d0
        do i = 1,10000000
            tmp = tmp + cos(sin(x))/real(i,8)
        end do
        f = tmp
    end function
    
    
    real    0m2.229s --- no openmp
    real    0m2.251s --- with openmp and ordered
    real    0m0.773s --- with openmp but ordered commented out
    

    推荐答案

    该程序不符合OpenMP标准.具体地说,问题是您有多个ordered区域,循环的每次迭代都将同时执行这两个区域. OpenMP 4.0标准有这样的说法(2.12.8,限制,第16行,第139页):

    This program is non-conforming to the OpenMP standard. Specifically, the problem is that you have more than one ordered region and every iteration of your loop will execute both of them. The OpenMP 4.0 standard has this to say (2.12.8, Restrictions, line 16, p 139):

    在执行循环或循环区域内的循环嵌套的迭代期间,线程不得执行绑定到同一循环的一个以上有序区域 地区.

    During execution of an iteration of a loop or a loop nest within a loop region, a thread must not execute more than one ordered region that binds to the same loop region.

    如果您有多个ordered区域,则必须具有条件代码路径,以便任何循环迭代都只能执行其中一个.

    If you have more than one ordered region, you must have conditional code paths such that only one of them can be executed for any loop iteration.

    还值得注意的是,您所订购区域的位置似乎会对性能产生影响.使用gfortran 5.2进行测试时,在执行有序区域后,对于每次循环迭代,一切都会出现,因此在循环开始时使用有序块会导致串行性能,而在循环结束时没有有序块则没有串行性能.这意味着将代码块并行化之前的代码.使用ifort 15进行测试并不那么费劲,但是我仍然建议对代码进行结构化,这样您的有序块就会出现在任何代码之后,而不是需要在循环迭代中并行化,而不是之前.

    It is also worth noting the position of your ordered region seems to have performance implications. Testing with gfortran 5.2, it appears everything after the ordered region is executed in order for each loop iteration, so having the ordered block at the beginning of the loop leads to serial performance while having the ordered block at the end of the loop does not have this implication as the code before the block is parallelized. Testing with ifort 15 is not as dramatic but I would still recommend structuring your code so your ordered block occurs after any code than needs parallelization in a loop iteration rather than before.

    这篇关于两个openmp有序块,没有并行化结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆