hpc工作失败了 [英] hpc job failed

查看:71
本文介绍了hpc工作失败了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部

我已经通过hpc集群中的作业管理控制台创建了一个集群作业,该作业包含可执行的mpi应用程序文件。

I had created a cluster job by job managment consol in hpc cluster that this job include a executable mpi application file.

我已经完成了这个步骤:

I had done this steps:

在工作管理中点击添加新工作

in ehe job management consol click on add new job

然后下一步并在任务页面上命令,写:mpiexec.exe 在worker目录中的myapp.exe

then next and at the task page on the command,write:mpiexec.exe  myapp.exe

,写一下:\\headnode \ myapp  我的exe文件所在的位置。

at the worker directory,write:\\headnode\myapp   the location that my exe file there.

并提交

但作业失败了....

but the job failed....

请帮帮我......

please help me...

推荐答案

你好,

有几个原因可能导致MPI作业失败,例如,错误的网络掩码,没有足够的资源,mpi servic edown等。在弄清楚MPI作业失败的根本原因之前,请你在这里发布完整的错误消息吗?您可以找到
失败的作业ID并使用命令:  task view [jobid] .1或者您可以浏览作业管理UI以查找失败作业的详细信息。

There are several reasons could cause the MPI job fail, for example, wrong net mask, not enough resources, mpi servic edown, etc. Before figure out what's the root cause of MPI job failures, could you please post the full error message here? You can find the failed job ID and using command: task view [jobid].1 or you can browse the job management UI to find the details of the failed job.

谢谢,

詹姆斯


这篇关于hpc工作失败了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆