mpirun:无法识别的参数mca [英] mpirun: Unrecognized argument mca
问题描述
我有一个C ++求解器,需要使用以下命令并行运行:
I have a c++ solver which I need to run in parallel using the following command:
nohup mpirun -np 16 ./my_exec > log.txt &
此命令将在我的节点上可用的16个处理器上独立运行my_exec
.这曾经完美地工作.
This command will run my_exec
independently on the 16 processors available on my node. This used to work perfectly.
上周,HPC部门执行了OS升级,现在,当启动同一命令时,我会收到两条警告消息(针对每个处理器).第一个是:
Last week, the HPC department performed an OS upgrade and now, when launching the same command, I get two warning messages (for each processor). The first one is:
--------------------------------------------------------------------------
2 WARNING: It appears that your OpenFabrics subsystem is configured to only
3 allow registering part of your physical memory. This can cause MPI jobs to
4 run with erratic performance, hang, and/or crash.
5
6 This may be caused by your OpenFabrics vendor limiting the amount of
7 physical memory that can be registered. You should investigate the
8 relevant Linux kernel module parameters that control how much physical
9 memory can be registered, and increase them to allow registering all
10 physical memory on your machine.
11
12 See this Open MPI FAQ item for more information on these Linux kernel module
13 parameters:
14
15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
16
17 Local host: tamnun
18 Registerable memory: 32768 MiB
19 Total memory: 98294 MiB
20
21 Your MPI job will continue, but may be behave poorly and/or hang.
22 --------------------------------------------------------------------------
23 --------------------------------------------------------------------------
然后我从代码中得到一个输出,告诉我它认为我只启动了1种代码实现(Nprocs
= 1而不是16).
I then get an output from my code, which tells me it thinks I am launching only 1 realization of the code (Nprocs
= 1 instead of 16).
177
178 # MPI IS ON; Nprocs = 1
179 Filename = ../input/odtParam.inp
180
181 # MPI IS ON; Nprocs = 1
182
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
最后,第二条警告消息是:
Finally, the second warning message is:
185 --------------------------------------------------------------------------
186 An MPI process has executed an operation involving a call to the
187 "fork()" system call to create a child process. Open MPI is currently
188 operating in a condition that could result in memory corruption or
189 other system errors; your MPI job may hang, crash, or produce silent
190 data corruption. The use of fork() (or system() or other calls that
191 create child processes) is strongly discouraged.
192
193 The process that invoked fork was:
194
195 Local host: tamnun (PID 17446)
196 MPI_COMM_WORLD rank: 0
197
198 If you are *absolutely sure* that your application will successfully
199 and correctly survive a call to fork(), you may disable this warning
200 by setting the mpi_warn_on_fork MCA parameter to 0.
201 --------------------------------------------------------------------------
在线浏览后,我尝试通过以下命令将MCA
参数mpi_warn_on_fork
设置为0,以遵循警告消息的建议:
After looking around online, I tried following the warning messages' advice by setting the MCA
parameter mpi_warn_on_fork
to 0 with the command:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
这会产生以下错误消息:
which yielded the following error message:
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
我正在使用RedHat 6.7(圣地亚哥).我联系了HPC部门,但是由于我在大学里,这个问题可能需要他们一两天才能做出答复.任何帮助或指导将不胜感激.
I am using RedHat 6.7 (Santiago). I contacted the HPC department, but since I am in a university, this issue may take them a day or two to respond. Any help or guidance would be appreciated.
编辑以回答以下问题:
实际上,我是在使用Intel的mpirun
命令运行可执行文件时使用Open MPI的mpic++
编译代码的,因此出现了错误(在操作系统升级后,英特尔的mpirun
设置为默认值).我必须将Open MPI的mpirun
路径放在$PATH
环境变量的开头.
Indeed, I was compiling my code with Open MPI's mpic++
while running the executable with Intel's mpirun
command, hence the error (after the OS upgrade Intel's mpirun
was set as the default). I had to put the Open MPI's mpirun
's path at the beginning of the $PATH
environmental variable.
代码现在按预期运行,但我仍然收到上面的第一条警告消息(它不再建议我使用MCA
参数mpi_warn_on_fork
.我认为(但不确定)这是我需要解决的问题与HPC部门解决.
The code now runs as expected BUT I still get the first warning message above (it does not advise me to use the MCA
parameter mpi_warn_on_fork
anymore. I think (but not sure) it is an issue I need to resolve with the HPC department.
推荐答案
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
在最后一种情况下,您正在使用MPICH. MPICH不是Open MPI,它的进程启动器无法识别特定于Open MPI的--mca
参数(MCA代表模块化组件体系结构-构建Open MPI的基本框架).多个MPI实现混在一起的典型案例.
You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca
parameter that is specific to Open MPI (MCA stands for Modular Component Architecture - the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.
这篇关于mpirun:无法识别的参数mca的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!