使用OpenMP并行化嵌套循环运行缓慢 [英] With OpenMP parallelized nested loops run slow

查看:315
本文介绍了使用OpenMP并行化嵌套循环运行缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Fortran程序的一部分,它包含一些我想与OpenMP并行的嵌套循环。

  integer :: nstates,N,i,dima,dimb,dimc,a_row,b_row,b_col,c_row,row,col 
double complex,dimension(4,4):: mat
double complex,dimension(:),allocatable :: vecin,vecout

nstates = 2
N = 24

分配(vecin(nstates ** N),vecout(nstates ** N))
vecin = ...某些数据
vecout = 0

mat = reshape([...一些数据...],[4,4])

dimb = nstates ** 2

!$ OMP PARALLEL DO PRIVATE(dima,dimc,row,col,a_row,b_row,c_row,b_col)
do i = 1,N-1
dima = nstates **(i-1)
dimc = nstates **(Ni-1)

做a_row = 1,dima
做b_row = 1,dimb
做c_row = 1,dimc
row = ((a_row-1)* dimb + b_row - 1)* dimc + c_row
do b_col = 1,dimb
col =((a_row-1)* dimb + b_col - 1)* dimc + c_row
!$ OMP ATOMIC
vecout(row)= vecout(row)+ vecin(col)* mat(b_row,b_col)
end do
end do
end do
end do
end do
!$ OMP END PARALLEL DO

该程序运行并且我得到的结果也是正确的,它的速度非常慢。比没有OpenMP慢很多。我对OpenMP了解不多。我使用PRIVATE或OMP ATOMIC做了什么错误吗?如果你的数组太大而且你得到的数据太多,那么我会很感激每一个建议如何提高我的代码的性能。

你可以使用可分配的临时数组来实现这个约简。



正如Francois Jacq指出的那样,你也有一个由<$ c $引起的竞争条件c> dima 和 dimb 这应该是私人的。

  double complex,dimension(:),allocatable :: tmp 

!$ OMP PARALLEL PRIVATE(dima,dimb,row,col,a_row,b_row,c_row,b_col,tmp)

allocate(tmp(size(vecout)))
tmp = 0

!$ OMP DO
do i = 1,N-1
dima = nstates **(i-1)
dimc = nstates **(Ni-1)

do a_row = 1,dima
do b_row = 1,dimb
do c_row = 1,dimc
row =((a_row-1)* dimb + b_row - 1)* dimc + c_row
do b_col = 1,dimb
col =(( a_row-1) * dimb + b_col - 1)* dimc + c_row
tmp(row)= tmp(row)+ vecin(col)* mat(b_row,b_col)
end do
end do
end do
end do
end do
!$ OMP END DO
$ b $!$ OMP CRITICAL
vecout = vecout + tmp
!$ OMP END CRITICAL
!$ OMP END PARALLEL


I've got a part of a fortran program consisting of some nested loops which I want to parallelize with OpenMP.

integer :: nstates , N, i, dima, dimb, dimc, a_row, b_row, b_col, c_row, row, col
double complex, dimension(4,4):: mat
double complex, dimension(:), allocatable :: vecin,vecout 

nstates = 2
N = 24

allocate(vecin(nstates**N), vecout(nstates**N))
vecin = ...some data
vecout = 0

mat = reshape([...some data...],[4,4])

dimb=nstates**2

!$OMP PARALLEL DO PRIVATE(dima,dimc,row,col,a_row,b_row,c_row,b_col) 
do i=1,N-1
    dima=nstates**(i-1)
    dimc=nstates**(N-i-1)

    do a_row = 1, dima
        do b_row = 1,dimb
            do c_row = 1,dimc
                row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row
                do b_col = 1,dimb
                    col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row
                    !$OMP ATOMIC
                    vecout(row) = vecout(row) + vecin(col)*mat(b_row,b_col)
                end do
            end do
        end do
    end do
end do
!$OMP END PARALLEL DO 

The program runs and the result I get is also correct, it's just incredible slow. Much slower than without OpenMP. I don't know much about OpenMP. Have I done something wrong with the use of PRIVATE or OMP ATOMIC? I would be grateful for every advice how to improve the performance of my code.

解决方案

If your arrays are too large and you get stack overflows with automatic reduction, you can implement the reduction yourself with allocatable temporary arrays.

As Francois Jacq pointed out, you also have a race condition caused by dima and dimb which should be private.

double complex, dimension(:), allocatable :: tmp

!$OMP PARALLEL PRIVATE(dima,dimb,row,col,a_row,b_row,c_row,b_col,tmp)

allocate(tmp(size(vecout)))
tmp = 0

!$OMP DO
do i=1,N-1
    dima=nstates**(i-1)
    dimc=nstates**(N-i-1)

    do a_row = 1, dima
        do b_row = 1,dimb
            do c_row = 1,dimc
                row = ((a_row-1)*dimb + b_row - 1)*dimc + c_row
                do b_col = 1,dimb
                    col = ((a_row-1)*dimb + b_col - 1)*dimc + c_row
                    tmp(row) = tmp(row) + vecin(col)*mat(b_row,b_col)
                end do
            end do
        end do
    end do
end do
!$OMP END DO

!$OMP CRITICAL
vecout = vecout + tmp
!$OMP END CRITICAL
!$OMP END PARALLEL

这篇关于使用OpenMP并行化嵌套循环运行缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆