使用 Fortran 减少 Openmp 数组 [英] Openmp array reductions with Fortran

查看:27
本文介绍了使用 Fortran 减少 Openmp 数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试并行化我编写的代码.我在对数组执行缩减时遇到问题.对于小型数组,这一切似乎都可以正常工作,但是当数组大小超过某个点时,我要么得到堆栈溢出错误,要么崩溃.

I'm trying to parallelize a code I've written. I'm having problems performing reducitons on arrays. It all seems to work fine for smallish arrays, however when the array size goes above a certain point I either get a stack overflow error or a crash.

我尝试在编译时使用/F 增加堆栈大小,我在 Windows 上使用 ifort,我还尝试将 set KMP_STACKSIZE=xxx 传递给英特尔特定的堆栈大小减免.这有时会有所帮助并允许代码在我的循环中进一步前进,但最终并不能解决问题,即使堆栈大小为 1Gb 或更大.

I've tried to increased the stack size using the /F at compile time, I'm using ifort on windows, I've also tried passing set KMP_STACKSIZE=xxx the intel specific stacksize decleration. This sometimes helps and allows the code to progress further through my loop but in the end doesn't resolve the issue, even with a stack size of 1Gb or greater.

下面是我的代码的一个独立的小型工作示例.它以串行方式工作,并且使用一个线程.或者有很多线程,但有一个小的N".较大的 N(即示例中的 250,000)会导致问题.

Below is a small self-contained working example of my code. It works in serial, and with one thread. Or with many threads but a small 'N'. A large N (i.e. like 250,000 in the example) causes problems.

我不认为这些数组如此庞大以至于会导致重大问题,并且假设增加我的堆栈空间会有所帮助 - 还有其他选择,还是我在编码中遗漏了一些重要的东西?

I didn't think these arrays were so massive so as to cause major problems, and presumed increasing my stack space would help - are there any other options, or have I missed something important in my coding ?

program testreduction
    use omp_lib
    implicit none
    integer :: i, j, nthreads, Nsize
    integer iseed /3/
    REAL, allocatable :: A(:,:), B(:), C(:), posi(:,:)
    REAL :: dx, dy, r, Axi, Ayi, m, F
    !Set size of matrix, and main loop
    Nsize = 250000
    m = 1.0
    F = 1.0
    !Allocate posi array
    allocate(posi(2,Nsize))
    !Fill with random numbers
    do i=1,Nsize
        do j=1,2
            posi(j,i) = (ran(iseed))
        end do
    end do
    !Allocate other arrays
    allocate(A(2,Nsize), C(Nsize), B(Nsize))

    print*, sizeof(A)+sizeof(B)+sizeof(C)
    !$OMP parallel
    nthreads = omp_get_num_threads()
    !$OMP end parallel

    print*, "Number of threads ", nthreads
    !Go through each array and do some work, calculating a reduction on A, B and C.
    !$OMP parallel do schedule(static) private(i, j, dx, dy, r, Axi, Ayi) reduction(+:C, B, A)
    do i=1,Nsize
        do j=1,Nsize
            !print*, i
            dx = posi(1,i) - posi(1,j)
            dy = posi(2,i) - posi(2,j)
            r = sqrt(dx**2+dy**2)
            Axi = -m*(F)*(dx/(r))
            Ayi = -m*(F)*(dy/(r))
            A(1,i) = A(1,i) + Axi
            A(2,i) = A(2,i) + Ayi
            B(i) = B(i) + (Axi+Ayi)
            C(i) = C(i) + dx/(r) + dy/(r)
        end do    
    END DO
    !$OMP END parallel do

end program

更新

我所说的更好的例子..

A better example of what I'm talking about ..

program testreduction2
    use omp_lib
    implicit none
    integer :: i, j, nthreads, Nsize, q, k, nsize2
    REAL, allocatable :: A(:,:), B(:), C(:)
    integer, ALLOCATABLE :: PAIRI(:), PAIRJ(:)

    Nsize = 25
    Nsize2 = 19
    q=0

    allocate(A(2,Nsize), C(Nsize), B(Nsize))
    ALLOCATE(PAIRI(nsize*nsize2), PAIRJ(nsize*nsize2))

    do i=1,nsize
        do j =1,nsize2
            q=q+1
            PAIRI(q) = i
            PAIRJ(q) = j
        end do
    end do

    A = 0
    B = 0
    C = 0

    !$OMP parallel do schedule(static) private(i, j, k)
    do k=1,q
        i=PAIRI(k)
        j=PAIRJ(k)
        A(1,i) = A(1,i) + 1
        A(2,i) = A(2,i) + 1
        B(i) = B(i) + 1
        C(i) = C(i) + 1
    END DO
    !$OMP END parallel do

    PRINT*, A
    PRINT*, B
    PRINT*, C       
END PROGRAM

推荐答案

问题是你正在减少非常大的数组.请注意,其他语言(C、C++)在 OpenMP 4.5 之前无法减少数组.

The problem is that you are reducing really large arrays. Note that other languages (C, C++) could not reduce arrays until OpenMP 4.5.

但我看不出你的情况有什么减少的理由,你只更新每个元素一次.

But I don't see any reason for the reduction in your case, you update each element only once.

试试吧

!$OMP parallel do schedule(static) private(i, dx, dy, r, Axi, Ayi)
do i=1,Nsize
  do j=1,Nsize
    ...
    A(1,i) = A(1,i) + Axi
    A(2,i) = A(2,i) + Ayi
    B(i) = B(i) + (Axi+Ayi)
    C(i) = C(i) + dx/(r) + dy/(r)
  end do
end do
!$OMP END parallel do

关键是线程不会相互干扰.每个线程使用不同的 i 集合,因此 ABC 的元素也不同.

The point is the threads do not interfare. Every thread uses different set of is and therefore different elements of A, B and C.

即使你想出了一个看起来很有必要的案例,你也可以随时重写它来避免它.您甚至可以自己分配一些缓冲区并模拟减少.或者使用原子更新.

Even if you come up with a case where it seems to be necessary, you can always rewrite it to avoid it. You can even allocate some buffers yourself and simulate the reduction. Or use atomic updates.

这篇关于使用 Fortran 减少 Openmp 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆