hpc工作失败了 [英] hpc job failed
问题描述
全部
我已经通过hpc集群中的作业管理控制台创建了一个集群作业,该作业包含可执行的mpi应用程序文件。
I had created a cluster job by job managment consol in hpc cluster that this job include a executable mpi application file.
我已经完成了这个步骤:
I had done this steps:
在工作管理中点击添加新工作
in ehe job management consol click on add new job
然后下一步并在任务页面上命令,写:mpiexec.exe 在worker目录中的myapp.exe
then next and at the task page on the command,write:mpiexec.exe myapp.exe
,写一下:\\headnode \ myapp 我的exe文件所在的位置。
at the worker directory,write:\\headnode\myapp the location that my exe file there.
并提交
但作业失败了....
but the job failed....
请帮帮我......
please help me...
推荐答案
你好,
有几个原因可能导致MPI作业失败,例如,错误的网络掩码,没有足够的资源,mpi servic edown等。在弄清楚MPI作业失败的根本原因之前,请你在这里发布完整的错误消息吗?您可以找到
失败的作业ID并使用命令: task view [jobid] .1或者您可以浏览作业管理UI以查找失败作业的详细信息。
There are several reasons could cause the MPI job fail, for example, wrong net mask, not enough resources, mpi servic edown, etc. Before figure out what's the root cause of MPI job failures, could you please post the full error message here? You can find the failed job ID and using command: task view [jobid].1 or you can browse the job management UI to find the details of the failed job.
谢谢,
詹姆斯
这篇关于hpc工作失败了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!