使用Fortran减少Openmp数组 [英] Openmp array reductions with Fortran

查看:116
本文介绍了使用Fortran减少Openmp数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试并行化我编写的代码.我在数组上执行归约运算时遇到问题.对于较小的数组,这一切似乎都可以正常工作,但是当数组大小超过某个点时,我会得到堆栈溢出错误或崩溃.

I'm trying to parallelize a code I've written. I'm having problems performing reducitons on arrays. It all seems to work fine for smallish arrays, however when the array size goes above a certain point I either get a stack overflow error or a crash.

我尝试在编译时使用/F增加堆栈大小,在Windows上使用ifort,还尝试将set KMP_STACKSIZE = xxx传递给Intel特定的堆栈大小清除.有时这会有所帮助,并允许代码在我的循环中继续前进,但即使堆栈大小为1Gb或更大,最终也无法解决问题.

I've tried to increased the stack size using the /F at compile time, I'm using ifort on windows, I've also tried passing set KMP_STACKSIZE=xxx the intel specific stacksize decleration. This sometimes helps and allows the code to progress further through my loop but in the end doesn't resolve the issue, even with a stack size of 1Gb or greater.

下面是我的代码的一个独立的小型工作示例.它以串行方式工作,并且只有一个线程.或有许多线程,但有一个小的"N".较大的N(例如,示例中为250,000)会引起问题.

Below is a small self-contained working example of my code. It works in serial, and with one thread. Or with many threads but a small 'N'. A large N (i.e. like 250,000 in the example) causes problems.

我不认为这些数组太大而导致严重的问题,并且认为增加堆栈空间会有所帮助-还有其他选择,还是我错过了一些重要的编码方法?

I didn't think these arrays were so massive so as to cause major problems, and presumed increasing my stack space would help - are there any other options, or have I missed something important in my coding ?

program testreduction
    use omp_lib
    implicit none
    integer :: i, j, nthreads, Nsize
    integer iseed /3/
    REAL, allocatable :: A(:,:), B(:), C(:), posi(:,:)
    REAL :: dx, dy, r, Axi, Ayi, m, F
    !Set size of matrix, and main loop
    Nsize = 250000
    m = 1.0
    F = 1.0
    !Allocate posi array
    allocate(posi(2,Nsize))
    !Fill with random numbers
    do i=1,Nsize
        do j=1,2
            posi(j,i) = (ran(iseed))
        end do
    end do
    !Allocate other arrays
    allocate(A(2,Nsize), C(Nsize), B(Nsize))

    print*, sizeof(A)+sizeof(B)+sizeof(C)
    !$OMP parallel
    nthreads = omp_get_num_threads()
    !$OMP end parallel

    print*, "Number of threads ", nthreads
    !Go through each array and do some work, calculating a reduction on A, B and C.
    !$OMP parallel do schedule(static) private(i, j, dx, dy, r, Axi, Ayi) reduction(+:C, B, A)
    do i=1,Nsize
        do j=1,Nsize
            !print*, i
            dx = posi(1,i) - posi(1,j)
            dy = posi(2,i) - posi(2,j)
            r = sqrt(dx**2+dy**2)
            Axi = -m*(F)*(dx/(r))
            Ayi = -m*(F)*(dy/(r))
            A(1,i) = A(1,i) + Axi
            A(2,i) = A(2,i) + Ayi
            B(i) = B(i) + (Axi+Ayi)
            C(i) = C(i) + dx/(r) + dy/(r)
        end do    
    END DO
    !$OMP END parallel do

end program

更新

一个更好的例子,我在说..

A better example of what I'm talking about ..

program testreduction2
    use omp_lib
    implicit none
    integer :: i, j, nthreads, Nsize, q, k, nsize2
    REAL, allocatable :: A(:,:), B(:), C(:)
    integer, ALLOCATABLE :: PAIRI(:), PAIRJ(:)

    Nsize = 25
    Nsize2 = 19
    q=0

    allocate(A(2,Nsize), C(Nsize), B(Nsize))
    ALLOCATE(PAIRI(nsize*nsize2), PAIRJ(nsize*nsize2))

    do i=1,nsize
        do j =1,nsize2
            q=q+1
            PAIRI(q) = i
            PAIRJ(q) = j
        end do
    end do

    A = 0
    B = 0
    C = 0

    !$OMP parallel do schedule(static) private(i, j, k)
    do k=1,q
        i=PAIRI(k)
        j=PAIRJ(k)
        A(1,i) = A(1,i) + 1
        A(2,i) = A(2,i) + 1
        B(i) = B(i) + 1
        C(i) = C(i) + 1
    END DO
    !$OMP END parallel do

    PRINT*, A
    PRINT*, B
    PRINT*, C       
END PROGRAM

推荐答案

问题是您要减少非常大的数组.请注意,其他语言(C,C ++)根本无法减少数组.

The problem is that you are reducing really large arrays. Note that other languages (C, C++) can't reduce arrays at all.

但是我看不出您的案件减少的任何原因,您只更新一次每个元素.

But I don't see any reason for the reduction in your case, you update each element only once.

尝试

!$OMP parallel do schedule(static) private(i, dx, dy, r, Axi, Ayi)
do i=1,Nsize
  do j=1,Nsize
    ...
    A(1,i) = A(1,i) + Axi
    A(2,i) = A(2,i) + Ayi
    B(i) = B(i) + (Axi+Ayi)
    C(i) = C(i) + dx/(r) + dy/(r)
  end do
end do
!$OMP END parallel do

重点是线程不相互干扰.每个线程使用不同的i集,因此使用不同的ABC元素.

The point is the threads do not interfare. Every thread uses different set of is and therefore different elements of A, B and C.

即使您提出了一个似乎有必要的案例,也可以始终重写它来避免它.您甚至可以自己分配一些缓冲区并模拟减少量.或使用原子更新.

Even if you come up with a case where it seems to be necessary, you can always rewrite it to avoid it. You can even allocate some buffers yourself and simulate the reduction. Or use atomic updates.

这篇关于使用Fortran减少Openmp数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆