将OpenMP与Fortran一起使用时运行FFTW时出现内存错误 [英] Memory error when using OpenMP with Fortran, running FFTW
问题描述
我在fortran程序中测试FFTW,因为我需要使用它。由于我正在处理巨大的矩阵,我的第一个解决方案是使用OpenMP。当我的矩阵的维数 500 x 500 x 500
时,发生以下错误:
操作系统错误:
程序中止。 Backtrace:
无法分配内存
分配将超过内存限制
我编译了使用以下代码: 注意这个错误发生在我只使用一个巨大的矩阵( 所以,我不明白: 500个500 x 500个元素的两个复杂数组需要4千兆字节的记忆。计算机可用内存的数量可能不足。 如果您只使用小窗口,则可能会考虑不在整个阵列中使用整个阵列时间,但只有部分。或者使用MPI在多台计算机上分配计算。 或者只是使用内存较大的计算机。 I am testing FFTW in a fortran program, because I need to use it. Since I am working with huge matrixes, my first solution is to use OpenMP. When my matrix has dimension I compiled the code using the following: Notice this error occurs when I just use a huge matrix( So, I don't understand:
1) Why there is memory allocation problem, if the huge matrix is a shared variable?
2) The solution I found is going to work if I have more huge matrix variables? For example, 3 more matrixes Two double complex arrays with 500 x 500 x 500 elements require 4 gigabytes of memory. It is likely that the amount of available memory in your computer is not sufficient. If you only work with small windows, you might consider not using the whole array at the whole time, but only parts of it. Or distribute the computation across multiple computers using MPI. Or just use a computer with bigger RAM. 这篇关于将OpenMP与Fortran一起使用时运行FFTW时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! gfortran -o test teste_fftw_openmp.f90 -I / usr / local / include -L / usr / lib / x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp $ c
$ b $ pre $ PROGRAM test_fftw
USE omp_lib
USE,intrinsic :: iso_c_binding
IMPLICIT NONE
INCLUDE'fftw3.f'
INTEGER :: i,DD = 500
DOUBLE COMPLEX :: OUTPUT_FFTW(3,3,3)
DOUBLE COMPLEX,ALLOCATABLE :: A3D (:,:,:),FINAL_OUTPUT(:,:,:)
integer * 8 :: plan
integer :: iret,nthreads
INTEGER :: indiceX,indiceY,indiceZ,window = 2
!使用OPENMP测试3D FFTW
ALLOCATE(A3D(DD,DD,DD))
ALLOCATE(FINAL_OUTPUT(DD-2,DD-2,DD-2))
write(*,* )'---------------'
write(*,*)'------------用OPENMP测试3D FFTW ---- ------'
A3D = reshape((/(i,i = 1,DD * DD * DD)/),shape(A3D))
CALL dfftw_init_threads(iret )
CALL dfftw_plan_with_nthreads(nthreads)
CALL dfftw_plan_dft_3d(计划,3,3,3,OUTPUT_FFTW,OUTPUT_FFTW,FFTW_FORWARD,FFTW_ESTIMATE)
FINAL_OUTPUT = 0。
!$ OMP PARALLEL DO DEFAULT(SHARED)SHARED(A3D,plan,window)&
!$ OMP PRIVATE(indiceX,indiceY,indiceZ,OUTPUT_FFTW,FINAL_OUTPUT)
DO indiceZ = 1,10!500-window
write(*,*)'INDICE Z =',indiceZ
DO indiceY = 1,10!500-window
DO indiceX = 1,10!500-window
CALL dfftw_execute_dft(plan,A3D(indiceX:indiceX + window,indiceY:indiceY + window ,indiceZ:indiceZ + window),OUTPUT_FFTW)
FINAL_OUTPUT(indiceX,indiceY,indiceZ)= SUM(ABS(OUTPUT_FFTW))
ENDDO
ENDDO
ENDDO
! $ OMP END PARALLEL DO
call dfftw_destroy_plan(plan)
CALL dfftw_cleanup_threads()
DEALLOCATE(A3D,FINAL_OUTPUT)
END PROGRAMME test_fftw
A3D
)而没有运行循环的时候这个矩阵的所有值(为了运行所有值,我应该将三个(嵌套)循环的限制定义为 500-window
。
我尝试过解决这个问题(提示这里和)与 -mcmodel = medium
在编译中没有成功。
当我使用 gfortran -o test编译时,我获得了成功teste_fftw_openmp.f90 -I / usr / local / include -L / usr / lib / x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp -fmax-stack-var-size = 65536
1)为什么是内存分配问题,如果巨大的矩阵是共享变量?
2)如果我有更多的巨大矩阵变量,我发现的解决方案会起作用。例如,另外3个矩阵 500 x 500 x 500
来存储计算结果。
3)在我发现的提示中,人们说使用可分配的数组/矩阵可以解决,但我没有任何区别。还有什么我需要为此做的吗?
500 x 500 x 500
, the following error happens:Operating system error:
Program aborted. Backtrace:
Cannot allocate memory
Allocation would exceed memory limit
gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp
PROGRAM test_fftw
USE omp_lib
USE, intrinsic:: iso_c_binding
IMPLICIT NONE
INCLUDE 'fftw3.f'
INTEGER::i, DD=500
DOUBLE COMPLEX:: OUTPUT_FFTW(3,3,3)
DOUBLE COMPLEX, ALLOCATABLE:: A3D(:,:,:), FINAL_OUTPUT(:,:,:)
integer*8:: plan
integer::iret, nthreads
INTEGER:: indiceX, indiceY, indiceZ, window=2
!! TESTING 3D FFTW with OPENMP
ALLOCATE(A3D(DD,DD,DD))
ALLOCATE(FINAL_OUTPUT(DD-2,DD-2,DD-2))
write(*,*) '---------------'
write(*,*) '------------TEST 3D FFTW WITH OPENMP----------'
A3D = reshape((/(i, i=1,DD*DD*DD)/),shape(A3D))
CALL dfftw_init_threads(iret)
CALL dfftw_plan_with_nthreads(nthreads)
CALL dfftw_plan_dft_3d(plan, 3,3,3, OUTPUT_FFTW, OUTPUT_FFTW, FFTW_FORWARD, FFTW_ESTIMATE)
FINAL_OUTPUT=0.
!$OMP PARALLEL DO DEFAULT(SHARED) SHARED(A3D,plan,window) &
!$OMP PRIVATE(indiceX, indiceY, indiceZ, OUTPUT_FFTW, FINAL_OUTPUT)
DO indiceZ=1,10!500-window
write(*,*) 'INDICE Z=', indiceZ
DO indiceY=1,10!500-window
DO indiceX=1,10!500-window
CALL dfftw_execute_dft(plan, A3D(indiceX:indiceX+window,indiceY:indiceY+window, indiceZ:indiceZ+window), OUTPUT_FFTW)
FINAL_OUTPUT(indiceX,indiceY,indiceZ)=SUM(ABS(OUTPUT_FFTW))
ENDDO
ENDDO
ENDDO
!$OMP END PARALLEL DO
call dfftw_destroy_plan(plan)
CALL dfftw_cleanup_threads()
DEALLOCATE(A3D,FINAL_OUTPUT)
END PROGRAM test_fftw
A3D
) without running the loop in all the values of this matrix (for running in all values, I should have the limits of the three (nested) loops as 500-window
.
I tried to solve this(tips here and here) with -mcmodel=medium
in the compilation without success.
I had success when I compiled with gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp -fmax-stack-var-size=65536
500 x 500 x 500
to store calculation results.
3) In the tips I found, people said that using allocatable arrays/matrixes would solve, but I was using without any difference. Is there anything else I need to do for this?