使用 Fortran 减少 Openmp 数组 [英] Openmp array reductions with Fortran
问题描述
我正在尝试并行化我编写的代码.我在对数组执行缩减时遇到问题.对于小型数组,这一切似乎都可以正常工作,但是当数组大小超过某个点时,我要么得到堆栈溢出错误,要么崩溃.
I'm trying to parallelize a code I've written. I'm having problems performing reducitons on arrays. It all seems to work fine for smallish arrays, however when the array size goes above a certain point I either get a stack overflow error or a crash.
我尝试在编译时使用/F 增加堆栈大小,我在 Windows 上使用 ifort,我还尝试将 set KMP_STACKSIZE=xxx 传递给英特尔特定的堆栈大小减免.这有时会有所帮助并允许代码在我的循环中进一步前进,但最终并不能解决问题,即使堆栈大小为 1Gb 或更大.
I've tried to increased the stack size using the /F at compile time, I'm using ifort on windows, I've also tried passing set KMP_STACKSIZE=xxx the intel specific stacksize decleration. This sometimes helps and allows the code to progress further through my loop but in the end doesn't resolve the issue, even with a stack size of 1Gb or greater.
下面是我的代码的一个独立的小型工作示例.它以串行方式工作,并且使用一个线程.或者有很多线程,但有一个小的N".较大的 N(即示例中的 250,000)会导致问题.
Below is a small self-contained working example of my code. It works in serial, and with one thread. Or with many threads but a small 'N'. A large N (i.e. like 250,000 in the example) causes problems.
我不认为这些数组如此庞大以至于会导致重大问题,并且假设增加我的堆栈空间会有所帮助 - 还有其他选择,还是我在编码中遗漏了一些重要的东西?
I didn't think these arrays were so massive so as to cause major problems, and presumed increasing my stack space would help - are there any other options, or have I missed something important in my coding ?
program testreduction
use omp_lib
implicit none
integer :: i, j, nthreads, Nsize
integer iseed /3/
REAL, allocatable :: A(:,:), B(:), C(:), posi(:,:)
REAL :: dx, dy, r, Axi, Ayi, m, F
!Set size of matrix, and main loop
Nsize = 250000
m = 1.0
F = 1.0
!Allocate posi array
allocate(posi(2,Nsize))
!Fill with random numbers
do i=1,Nsize
do j=1,2
posi(j,i) = (ran(iseed))
end do
end do
!Allocate other arrays
allocate(A(2,Nsize), C(Nsize), B(Nsize))
print*, sizeof(A)+sizeof(B)+sizeof(C)
!$OMP parallel
nthreads = omp_get_num_threads()
!$OMP end parallel
print*, "Number of threads ", nthreads
!Go through each array and do some work, calculating a reduction on A, B and C.
!$OMP parallel do schedule(static) private(i, j, dx, dy, r, Axi, Ayi) reduction(+:C, B, A)
do i=1,Nsize
do j=1,Nsize
!print*, i
dx = posi(1,i) - posi(1,j)
dy = posi(2,i) - posi(2,j)
r = sqrt(dx**2+dy**2)
Axi = -m*(F)*(dx/(r))
Ayi = -m*(F)*(dy/(r))
A(1,i) = A(1,i) + Axi
A(2,i) = A(2,i) + Ayi
B(i) = B(i) + (Axi+Ayi)
C(i) = C(i) + dx/(r) + dy/(r)
end do
END DO
!$OMP END parallel do
end program
更新
我所说的更好的例子..
A better example of what I'm talking about ..
program testreduction2
use omp_lib
implicit none
integer :: i, j, nthreads, Nsize, q, k, nsize2
REAL, allocatable :: A(:,:), B(:), C(:)
integer, ALLOCATABLE :: PAIRI(:), PAIRJ(:)
Nsize = 25
Nsize2 = 19
q=0
allocate(A(2,Nsize), C(Nsize), B(Nsize))
ALLOCATE(PAIRI(nsize*nsize2), PAIRJ(nsize*nsize2))
do i=1,nsize
do j =1,nsize2
q=q+1
PAIRI(q) = i
PAIRJ(q) = j
end do
end do
A = 0
B = 0
C = 0
!$OMP parallel do schedule(static) private(i, j, k)
do k=1,q
i=PAIRI(k)
j=PAIRJ(k)
A(1,i) = A(1,i) + 1
A(2,i) = A(2,i) + 1
B(i) = B(i) + 1
C(i) = C(i) + 1
END DO
!$OMP END parallel do
PRINT*, A
PRINT*, B
PRINT*, C
END PROGRAM
推荐答案
问题是你正在减少非常大的数组.请注意,其他语言(C、C++)在 OpenMP 4.5 之前无法减少数组.
The problem is that you are reducing really large arrays. Note that other languages (C, C++) could not reduce arrays until OpenMP 4.5.
但我看不出你的情况有什么减少的理由,你只更新每个元素一次.
But I don't see any reason for the reduction in your case, you update each element only once.
试试吧
!$OMP parallel do schedule(static) private(i, dx, dy, r, Axi, Ayi)
do i=1,Nsize
do j=1,Nsize
...
A(1,i) = A(1,i) + Axi
A(2,i) = A(2,i) + Ayi
B(i) = B(i) + (Axi+Ayi)
C(i) = C(i) + dx/(r) + dy/(r)
end do
end do
!$OMP END parallel do
关键是线程不会相互干扰.每个线程使用不同的 i
集合,因此 A
、B
和 C
的元素也不同.
The point is the threads do not interfare. Every thread uses different set of i
s and therefore different elements of A
, B
and C
.
即使你想出了一个看起来很有必要的案例,你也可以随时重写它来避免它.您甚至可以自己分配一些缓冲区并模拟减少.或者使用原子更新.
Even if you come up with a case where it seems to be necessary, you can always rewrite it to avoid it. You can even allocate some buffers yourself and simulate the reduction. Or use atomic updates.
这篇关于使用 Fortran 减少 Openmp 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!