有人可以用打开的mpi解释此valgrind错误吗? [英] Can someone explain this valgrind error with open mpi?
问题描述
我的基本问题是关于抑制文件在valgrind中如何工作.我看了很多文档,这些文档指向在mpi版本> 1.5(我的是1.6)上使用以下内容:
My basic question is about how the suppression files work in valgrind. I have looked at a lot of the documentation that points to using the following on mpi versions > 1.5 (mine is 1.6):
mpirun -np 2 valgrind --suppressions=/usr/share/openmpi/openmpi-valgrind.supp --track-origins=yes ./myprog
但是,当我像这样运行它时,文件有600多个错误! 我得到的错误是这两个反复.我不知道如何用我目前对valgrind和mpi的理解来解释其中任何一个.
However, when I run it like this the file has over 600 errors! The errors I am getting are these two over and over. I don't know how to interpret either one of these with my current understanding of valgrind and mpi.
==8821== Address 0xad5e4d7 is 87 bytes inside a block of size 128 alloc'd
==8821== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8821== by 0x6348C52: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821== by 0x6349AF1: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821== by 0x6349B81: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821== by 0x7DA5B9C: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==8821== by 0x7DA52F4: ??? (in /usr/lib/openmpi/lib/openmpi/mca_grpcomm_bad.so)
==8821== by 0x5082AF2: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821== by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821== by 0x408AB5: main (test_send-receive.cpp:8)
==8821== Uninitialised value was created by a heap allocation
==8821== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8821== by 0x635FE2B: ??? (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821== by 0x6360634: opal_ifcount (in /usr/lib/openmpi/lib/libopen-pal.so.0.0.0)
==8821== by 0x81B36AA: ??? (in /usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so)
==8821== by 0x5C01EE2: mca_oob_base_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821== by 0x7FA97FB: ??? (in /usr/lib/openmpi/lib/openmpi/mca_rml_oob.so)
==8821== by 0x5C083E4: orte_rml_base_select (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821== by 0x5BF5EC4: orte_ess_base_app_setup (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821== by 0x7BA1EAE: ??? (in /usr/lib/openmpi/lib/openmpi/mca_ess_env.so)
==8821== by 0x5BDDB72: orte_init (in /usr/lib/openmpi/lib/libopen-rte.so.0.0.0)
==8821== by 0x50822E0: ??? (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
==8821== by 0x50A33FA: PMPI_Init (in /usr/lib/openmpi/lib/libmpi.so.0.0.2)
产生这些错误的代码是:
The code that produces these errors is:
int main(int argc, char *argv[]) {
/* init MPI */
MPI_Init(&argc, &argv);
int myid;
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
int i;
if(myid == 0){
double * d = new double [10];
for(i = 0; i<10; i++){
d[i] = i + 1.0;
}
MPI_Send(d,
10,
MPI_DOUBLE,
1,
1,
MPI_COMM_WORLD);
delete[] d;
} else {
MPI_Status status;
double * c = new double [10];
MPI_Recv(c,
10,
MPI_DOUBLE,
0,
MPI_ANY_TAG,
MPI_COMM_WORLD,
&status);
for(i = 0; i<10; i++){
printf("%f\n", c[i]);
}
delete[] c;
}
MPI_Finalize();
return 0;
}
此外,此代码可以正常运行并输出预期结果.我是误解了数据如何通过网络发送,还是这里发生了我不理解的事情?
Also, this code runs just fine and outputs the expected results. Am I misunderstanding how the data is sent over the network or is there something else going on here that I don't understand?
很抱歉,文章的篇幅太长了,你们甚至都读了这么远.
Sorry about the length of the post, you guys rock for even reading this far.
推荐答案
很有可能我们的禁止文件不是OMPI v1.6中的最新文件. :-\
It's quite possible that our suppression file is not up-to-date in OMPI v1.6. :-\
您应在OMPI邮件列表中报告此情况.参见 http://www.open-mpi.org/community/lists/ompi. php .
You should report this on the OMPI mailing list. See http://www.open-mpi.org/community/lists/ompi.php.
这篇关于有人可以用打开的mpi解释此valgrind错误吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!