MPI_REDUCE导致内存泄漏 [英] MPI_REDUCE causing memory leak

查看:380
本文介绍了MPI_REDUCE导致内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近遇到了一种堰坝行为。如果我在我的机器上运行以下代码(使用最新版本的cygwin,打开MPI版本1.8.6),我得到了线性增长的内存使用情况,它很快压倒了我的电脑。

 程序内存测试

使用mpi

隐式无

integer :: ierror,errorStatus!错误代码
integer :: my_rank!等级进程
integer :: p!进程的数量
integer :: i,a,b

call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD,my_rank,ierror)
call MPI_Comm_size(MPI_COMM_WORLD, p,ierror)

b = 0
do i = 1,10000000
a = 1 * my_rank
call MPI_REDUCE(a,b,1,MPI_INTEGER,MPI_MAX,0 ,MPI_COMM_WORLD,errorStatus)
end do

call MPI_Finalize(ierror)

stop
end programme memoryTest

任何想法可能是什么问题?代码对我初学者的眼睛看起来很好。编译行是

$ pre $ mpif90 -O2 -o memoryTest.exe memoryTest.f90


解决方案

这已在相关主题中讨论 here。



问题是根进程需要从其他进程接收数据并执行缩减,而其他进程只需将数据发送到根进程。所以根进程运行速度较慢,并且可能会被传入消息的数量所淹没。如果你在MPI_REDUCE调用后插入MPI_BARRIER调用,那么代码应该没有问题。



MPI规范的相关部分说:集体操作可以(但不一定要)在呼叫者的
参与集体通信完成后立即完成。
呼叫返回后,阻止操作完成
。非阻塞(立即)调用需要单独完成
调用(参见Section
3.7
)。完成一个集合操作表明调用者可以免费
来修改通信缓冲区中的位置。并不表示组中的其他进程
已经完成或者甚至开始了操作(除非
操作说明另有暗示)。因此,集体通信操作可以或不可以,
具有同步所有调用进程的效果。此声明不包括cou rse,
障碍操作。


I have recently encountered a weir behavior. If I run the following code on my machine (using the most recent version of cygwin, Open MPI version 1.8.6) I get a linearly growing memory usage that quickly overwhelms my pc.

program memoryTest

use mpi

implicit none

integer            :: ierror,errorStatus      ! error codes
integer            :: my_rank                 ! rank of process
integer            :: p                       ! number of processes
integer            :: i,a,b

call MPI_Init(ierror)
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank, ierror)
call MPI_Comm_size(MPI_COMM_WORLD, p, ierror)

b=0
do i=1,10000000
    a=1*my_rank
    call MPI_REDUCE(a,b,1,MPI_INTEGER,MPI_MAX,0,MPI_COMM_WORLD,errorStatus)
end do

call MPI_Finalize(ierror)

stop
end program memoryTest

Any idea what the problem might be? The code looks fine to my beginner's eyes. The compilation line is

mpif90 -O2 -o memoryTest.exe memoryTest.f90

解决方案

This has been discussed in a related thread here.

The problem is that the root process needs to receive data from other processes and perform the reduction while other processes only need to send the data to the root process. So the root process is running slower and it could be overwhelmed by the number of incoming messages. If you insert at MPI_BARRIER call after the MPI_REDUCE call then the code should run without a problem.

The relevant part of the MPI specification says: "Collective operations can (but are not required to) complete as soon as the caller's participation in the collective communication is finished. A blocking operation is complete as soon as the call returns. A nonblocking (immediate) call requires a separate completion call (cf. Section 3.7 ). The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation."

这篇关于MPI_REDUCE导致内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆