mpi 改变了它不应该改变的变量 [英] mpi alters a variable it shouldn't

查看:36
本文介绍了mpi 改变了它不应该改变的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些 Fortran 代码正在与 MPI 进行并行处理,这些代码确实很奇怪.首先,有一个变量 nstartg,我从 boss 进程向所有工人广播:

I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:

call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)

变量 nstartg 在程序中永远不会再被更改.稍后,我让老板进程将数组 edgeeproc 元素发送给工人:

The variable nstartg is never altered again in the program. Later on, I have the boss process send eproc elements of an array edge to the workers:

if (me==0) then
    do n=1,ntasks-1
        (determine the starting point estart and the number eproc
         of values to send)
        call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
    enddo
endif

如果 me 不为零,则使用匹配的接收语句.(为了便于阅读,我省略了一些其他代码;我不使用 scatterv 是有充分理由的.)

with a matching receive statement if me is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)

这就是事情变得奇怪的地方:变量 nstartg 被更改为 n 而不是保持其实际值.例如,在进程 1 上,在 mpi_recv 之后,nstartg = 1,在进程 2 上它等于 2,以此类推.此外,如果我将上面的代码更改为

Here's where things get weird: the variable nstartg gets altered to n instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to

call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)

并在对 mpi_recv 的匹配调用中相应地更改标记,然后在进程 1 上,nstartg = 1234568;在进程 2 上,nstartg = 1234569 等

and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.

到底发生了什么?我改变的只是 mpi_send/recv 用来识别消息的标签;只要标签是唯一的,这样消息就不会混淆,这不应该改变任何东西,但它正在改变一个完全不相关的变量.

What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.

在boss进程中,nstartg是不变的,所以我可以通过再次广播来解决这个问题,但这几乎不是一个真正的解决方案.最后,我应该提一下,使用电子围栏编译和运行这段代码并没有发现任何缓冲区溢出,-fbounds-check 也没有向我抛出任何东西.

On the boss process, nstartg is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.

推荐答案

最可能的原因是您将 INTEGER 标量作为实际的 status 参数传递给 MPI_RECV 当它应该真正声明为具有特定于实现大小的数组时,可作为 MPI_STATUS_SIZE 常量使用:

The most probable cause is that you pass an INTEGER scalar as the actual status argument to MPI_RECV when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE constant:

INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status

INTEGER status(MPI_STATUS_SIZE)

消息标签由接收操作写入状态字段之一(其特定于实现的索引可作为 MPI_TAG 常量使用,并且字段值可以作为 status(MPI_TAG)) 并且如果您的 status 只是一个标量 INTEGER,那么其他几个局部变量将被覆盖.在您的情况下,它只是发生在 nstartg 刚好落在堆栈中的 status 之上.

The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG constant and the field value can be accessed as status(MPI_TAG)) and if your status is simply a scalar INTEGER, then several other local variables would get overwritten. In your case it simply happens so that nstartg falls just above status in the stack.

如果你不关心接收状态,你可以通过特殊常量MPI_STATUS_IGNORE代替.

If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE instead.

这篇关于mpi 改变了它不应该改变的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆