mpirun:无法识别的参数mca [英] mpirun: Unrecognized argument mca

查看:420
本文介绍了mpirun:无法识别的参数mca的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个C ++求解器,需要使用以下命令并行运行:

I have a c++ solver which I need to run in parallel using the following command:

nohup mpirun -np 16 ./my_exec > log.txt &

此命令将在我的节点上可用的16个处理器上独立运行my_exec.这曾经完美地工作.

This command will run my_exec independently on the 16 processors available on my node. This used to work perfectly.

上周,HPC部门执行了OS升级,现在,当启动同一命令时,我会收到两条警告消息(针对每个处理器).第一个是:

Last week, the HPC department performed an OS upgrade and now, when launching the same command, I get two warning messages (for each processor). The first one is:

--------------------------------------------------------------------------                           
2 WARNING: It appears that your OpenFabrics subsystem is configured to only                            
3 allow registering part of your physical memory.  This can cause MPI jobs to                          
4 run with erratic performance, hang, and/or crash.                                                    
5                                                                                                      
6 This may be caused by your OpenFabrics vendor limiting the amount of                                 
7 physical memory that can be registered.  You should investigate the                                  
8 relevant Linux kernel module parameters that control how much physical                               
9 memory can be registered, and increase them to allow registering all                                 
10 physical memory on your machine.                                                                     
11                                                                                                      
12 See this Open MPI FAQ item for more information on these Linux kernel module                         
13 parameters:                                                                                          
14                                                                                                      
15     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages                                
16                                                                                                      
17   Local host:              tamnun                                                                    
18   Registerable memory:     32768 MiB                                                                 
19   Total memory:            98294 MiB                                                                 
20                                                                                                      
21 Your MPI job will continue, but may be behave poorly and/or hang.                                    
22 --------------------------------------------------------------------------                           
23 --------------------------------------------------------------------------        

然后我从代码中得到一个输出,告诉我它认为我只启动了1种代码实现(Nprocs = 1而不是16).

I then get an output from my code, which tells me it thinks I am launching only 1 realization of the code (Nprocs = 1 instead of 16).

177                                                                                                      
178 # MPI IS ON; Nprocs = 1                                                                              
179 Filename = ../input/odtParam.inp                                                                     
180                                                                                                      
181 # MPI IS ON; Nprocs = 1                                                                              
182                                                                                                      
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there

最后,第二条警告消息是:

Finally, the second warning message is:

185 --------------------------------------------------------------------------                           
186 An MPI process has executed an operation involving a call to the                                     
187 "fork()" system call to create a child process.  Open MPI is currently                               
188 operating in a condition that could result in memory corruption or                                   
189 other system errors; your MPI job may hang, crash, or produce silent                                 
190 data corruption.  The use of fork() (or system() or other calls that                                 
191 create child processes) is strongly discouraged.                                                     
192                                                                                                      
193 The process that invoked fork was:                                                                   
194                                                                                                      
195   Local host:          tamnun (PID 17446)                                                            
196   MPI_COMM_WORLD rank: 0                                                                             
197                                                                                                      
198 If you are *absolutely sure* that your application will successfully                                 
199 and correctly survive a call to fork(), you may disable this warning                                 
200 by setting the mpi_warn_on_fork MCA parameter to 0.                                                  
201 --------------------------------------------------------------------------     

在线浏览后,我尝试通过以下命令将MCA参数mpi_warn_on_fork设置为0,以遵循警告消息的建议:

After looking around online, I tried following the warning messages' advice by setting the MCA parameter mpi_warn_on_fork to 0 with the command:

nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &

这会产生以下错误消息:

which yielded the following error message:

[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments

我正在使用RedHat 6.7(圣地亚哥).我联系了HPC部门,但是由于我在大学里,这个问题可能需要他们一两天才能做出答复.任何帮助或指导将不胜感激.

I am using RedHat 6.7 (Santiago). I contacted the HPC department, but since I am in a university, this issue may take them a day or two to respond. Any help or guidance would be appreciated.

编辑以回答以下问题:

实际上,我是在使用Intel的mpirun命令运行可执行文件时使用Open MPI的mpic++编译代码的,因此出现了错误(在操作系统升级后,英特尔的mpirun设置为默认值).我必须将Open MPI的mpirun路径放在$PATH环境变量的开头.

Indeed, I was compiling my code with Open MPI's mpic++ while running the executable with Intel's mpirun command, hence the error (after the OS upgrade Intel's mpirun was set as the default). I had to put the Open MPI's mpirun's path at the beginning of the $PATH environmental variable.

代码现在按预期运行,但我仍然收到上面的第一条警告消息(它不再建议我使用MCA参数mpi_warn_on_fork.我认为(但不确定)这是我需要解决的问题与HPC部门解决.

The code now runs as expected BUT I still get the first warning message above (it does not advise me to use the MCA parameter mpi_warn_on_fork anymore. I think (but not sure) it is an issue I need to resolve with the HPC department.

推荐答案

[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
                                  ^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
                                                  ^^^^^

在最后一种情况下,您正在使用MPICH. MPICH不是Open MPI,它的进程启动器无法识别特定于Open MPI的--mca参数(MCA代表模块化组件体系结构-构建Open MPI的基本框架).多个MPI实现混在一起的典型案例.

You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca parameter that is specific to Open MPI (MCA stands for Modular Component Architecture - the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.

这篇关于mpirun:无法识别的参数mca的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆