将OpenMP与Fortran一起使用时运行FFTW时出现内存错误 [英] Memory error when using OpenMP with Fortran, running FFTW

查看:902
本文介绍了将OpenMP与Fortran一起使用时运行FFTW时出现内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在fortran程序中测试FFTW,因为我需要使用它。由于我正在处理巨大的矩阵,我的第一个解决方案是使用OpenMP。当我的矩阵的维数 500 x 500 x 500 时,发生以下错误:

 操作系统错误:
程序中止。 Backtrace:
无法分配内存
分配将超过内存限制

我编译了使用以下代码: gfortran -o test teste_fftw_openmp.f90 -I / usr / local / include -L / usr / lib / x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp $ b $ pre $ PROGRAM test_fftw
USE omp_lib
USE,intrinsic :: iso_c_binding
IMPLICIT NONE
INCLUDE'fftw3.f'
INTEGER :: i,DD = 500
DOUBLE COMPLEX :: OUTPUT_FFTW(3,3,3)
DOUBLE COMPLEX,ALLOCATABLE :: A3D (:,:,:),FINAL_OUTPUT(:,:,:)
integer * 8 :: plan
integer :: iret,nthreads
INTEGER :: indiceX,indiceY,indiceZ,window = 2

!使用OPENMP测试3D FFTW
ALLOCATE(A3D(DD,DD,DD))
ALLOCATE(FINAL_OUTPUT(DD-2,DD-2,DD-2))
write(*,* )'---------------'
write(*,*)'------------用OPENMP测试3D FFTW ---- ------'
A3D = reshape((/(i,i = 1,DD * DD * DD)/),shape(A3D))

CALL dfftw_init_threads(iret )
CALL dfftw_plan_with_nthreads(nthreads)

CALL dfftw_plan_dft_3d(计划,3,3,3,OUTPUT_FFTW,OUTPUT_FFTW,FFTW_FORWARD,FFTW_ESTIMATE)
FINAL_OUTPUT = 0。
!$ OMP PARALLEL DO DEFAULT(SHARED)SHARED(A3D,plan,window)&
!$ OMP PRIVATE(indiceX,indiceY,indiceZ,OUTPUT_FFTW,FINAL_OUTPUT)
DO indiceZ = 1,10!500-window
write(*,*)'INDICE Z =',indiceZ
DO indiceY = 1,10!500-window
DO indiceX = 1,10!500-window
CALL dfftw_execute_dft(plan,A3D(indiceX:indiceX + window,indiceY:indiceY + window ,indiceZ:indiceZ + window),OUTPUT_FFTW)
FINAL_OUTPUT(indiceX,indiceY,indiceZ)= SUM(ABS(OUTPUT_FFTW))
ENDDO
ENDDO
ENDDO
! $ OMP END PARALLEL DO
call dfftw_destroy_plan(plan)
CALL dfftw_cleanup_threads()
DEALLOCATE(A3D,FINAL_OUTPUT)
END PROGRAMME test_fftw

注意这个错误发生在我只使用一个巨大的矩阵( A3D )而没有运行循环的时候这个矩阵的所有值(为了运行所有值,我应该将三个(嵌套)循环的限制定义为 500-window
我尝试过解决这个问题(提示这里)与 -mcmodel = medium 在编译中没有成功。
当我使用 gfortran -o test编译时,我获得了成功teste_fftw_openmp.f90 -I / usr / local / include -L / usr / lib / x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp -fmax-stack-var-size = 65536



所以,我不明白:
1)为什么是内存分配问题,如果巨大的矩阵是共享变量?
2)如果我有更多的巨大矩阵变量,我发现的解决方案会起作用。例如,另外3个矩阵 500 x 500 x 500 来存储计算结果。
3)在我发现的提示中,人们说使用可分配的数组/矩阵可以解决,但我没有任何区别。还有什么我需要为此做的吗?

解决方案

500个500 x 500个元素的两个复杂数组需要4千兆字节的记忆。计算机可用内存的数量可能不足。

如果您只使用小窗口,则可能会考虑不在整个阵列中使用整个阵列时间,但只有部分。或者使用MPI在多台计算机上分配计算。



或者只是使用内存较大的计算机。


I am testing FFTW in a fortran program, because I need to use it. Since I am working with huge matrixes, my first solution is to use OpenMP. When my matrix has dimension 500 x 500 x 500, the following error happens:

Operating system error: 
Program aborted. Backtrace:
Cannot allocate memory
Allocation would exceed memory limit

I compiled the code using the following: gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp

PROGRAM test_fftw
USE omp_lib      
USE, intrinsic:: iso_c_binding
IMPLICIT NONE
INCLUDE 'fftw3.f'
INTEGER::i, DD=500
DOUBLE COMPLEX:: OUTPUT_FFTW(3,3,3) 
DOUBLE COMPLEX, ALLOCATABLE:: A3D(:,:,:), FINAL_OUTPUT(:,:,:)
integer*8:: plan
integer::iret, nthreads
INTEGER:: indiceX, indiceY, indiceZ, window=2

!! TESTING 3D FFTW with OPENMP
ALLOCATE(A3D(DD,DD,DD))
ALLOCATE(FINAL_OUTPUT(DD-2,DD-2,DD-2))
write(*,*) '---------------'
write(*,*) '------------TEST 3D FFTW WITH OPENMP----------'
A3D = reshape((/(i, i=1,DD*DD*DD)/),shape(A3D))

CALL dfftw_init_threads(iret)
CALL dfftw_plan_with_nthreads(nthreads)

CALL dfftw_plan_dft_3d(plan, 3,3,3, OUTPUT_FFTW, OUTPUT_FFTW, FFTW_FORWARD, FFTW_ESTIMATE)
FINAL_OUTPUT=0.
!$OMP PARALLEL DO DEFAULT(SHARED) SHARED(A3D,plan,window) &
!$OMP PRIVATE(indiceX, indiceY, indiceZ, OUTPUT_FFTW, FINAL_OUTPUT)
DO indiceZ=1,10!500-window
    write(*,*) 'INDICE Z=', indiceZ
    DO indiceY=1,10!500-window
        DO indiceX=1,10!500-window
            CALL dfftw_execute_dft(plan, A3D(indiceX:indiceX+window,indiceY:indiceY+window, indiceZ:indiceZ+window), OUTPUT_FFTW)
            FINAL_OUTPUT(indiceX,indiceY,indiceZ)=SUM(ABS(OUTPUT_FFTW))
        ENDDO    
    ENDDO    
ENDDO
!$OMP END PARALLEL DO
call dfftw_destroy_plan(plan)
CALL dfftw_cleanup_threads()
DEALLOCATE(A3D,FINAL_OUTPUT)
END PROGRAM test_fftw

Notice this error occurs when I just use a huge matrix(A3D) without running the loop in all the values of this matrix (for running in all values, I should have the limits of the three (nested) loops as 500-window. I tried to solve this(tips here and here) with -mcmodel=medium in the compilation without success. I had success when I compiled with gfortran -o test teste_fftw_openmp.f90 -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lfftw3_omp -lfftw3 -lm -fopenmp -fmax-stack-var-size=65536

So, I don't understand: 1) Why there is memory allocation problem, if the huge matrix is a shared variable? 2) The solution I found is going to work if I have more huge matrix variables? For example, 3 more matrixes 500 x 500 x 500 to store calculation results. 3) In the tips I found, people said that using allocatable arrays/matrixes would solve, but I was using without any difference. Is there anything else I need to do for this?

解决方案

Two double complex arrays with 500 x 500 x 500 elements require 4 gigabytes of memory. It is likely that the amount of available memory in your computer is not sufficient.

If you only work with small windows, you might consider not using the whole array at the whole time, but only parts of it. Or distribute the computation across multiple computers using MPI.

Or just use a computer with bigger RAM.

这篇关于将OpenMP与Fortran一起使用时运行FFTW时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆