mpi 改变了它不应该改变的变量 [英] mpi alters a variable it shouldn't
问题描述
我有一些 Fortran 代码正在与 MPI 进行并行处理,这些代码确实很奇怪.首先,有一个变量 nstartg,我从 boss 进程向所有工人广播:
I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:
call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)
变量 nstartg
在程序中永远不会再被更改.稍后,我让老板进程将数组 edge
的 eproc
元素发送给工人:
The variable nstartg
is never altered again in the program. Later on, I have the boss process send eproc
elements of an array edge
to the workers:
if (me==0) then
do n=1,ntasks-1
(determine the starting point estart and the number eproc
of values to send)
call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
enddo
endif
如果 me
不为零,则使用匹配的接收语句.(为了便于阅读,我省略了一些其他代码;我不使用 scatterv 是有充分理由的.)
with a matching receive statement if me
is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)
这就是事情变得奇怪的地方:变量 nstartg
被更改为 n
而不是保持其实际值.例如,在进程 1 上,在 mpi_recv 之后,nstartg = 1
,在进程 2 上它等于 2,以此类推.此外,如果我将上面的代码更改为
Here's where things get weird: the variable nstartg
gets altered to n
instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1
, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to
call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)
并在对 mpi_recv 的匹配调用中相应地更改标记,然后在进程 1 上,nstartg = 1234568;在进程 2 上,nstartg = 1234569 等
and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.
到底发生了什么?我改变的只是 mpi_send/recv 用来识别消息的标签;只要标签是唯一的,这样消息就不会混淆,这不应该改变任何东西,但它正在改变一个完全不相关的变量.
What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.
在boss进程中,nstartg
是不变的,所以我可以通过再次广播来解决这个问题,但这几乎不是一个真正的解决方案.最后,我应该提一下,使用电子围栏编译和运行这段代码并没有发现任何缓冲区溢出,-fbounds-check 也没有向我抛出任何东西.
On the boss process, nstartg
is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.
推荐答案
最可能的原因是您将 INTEGER
标量作为实际的 status
参数传递给 MPI_RECV
当它应该真正声明为具有特定于实现大小的数组时,可作为 MPI_STATUS_SIZE
常量使用:
The most probable cause is that you pass an INTEGER
scalar as the actual status
argument to MPI_RECV
when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE
constant:
INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status
或
INTEGER status(MPI_STATUS_SIZE)
消息标签由接收操作写入状态字段之一(其特定于实现的索引可作为 MPI_TAG
常量使用,并且字段值可以作为 status(MPI_TAG)
) 并且如果您的 status
只是一个标量 INTEGER
,那么其他几个局部变量将被覆盖.在您的情况下,它只是发生在 nstartg
刚好落在堆栈中的 status
之上.
The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG
constant and the field value can be accessed as status(MPI_TAG)
) and if your status
is simply a scalar INTEGER
, then several other local variables would get overwritten. In your case it simply happens so that nstartg
falls just above status
in the stack.
如果你不关心接收状态,你可以通过特殊常量MPI_STATUS_IGNORE
代替.
If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE
instead.
这篇关于mpi 改变了它不应该改变的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!